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(57) Abstract 



The invention relates to GA4 homologue (GA4H) DNA and 
proteins encoded by GA4H DNA. GA4H is believed to be a 
member of the family of enzymes involved in the biosynthesis of 
the gibberellin family (GA) of plant growth hormones that promote 
various growth and developmental processes in higher plants, such 
as seed germination, stem elongation, flowering and fruiting. More 
specifically, the protein encoded by the GA4H loci may have similar 
function(s) to ^-hydroxylases. The invention also relates to vectors 
containing the DNA and the expression of the protein encoded by 
the DNA of the invention in a host cell. Additional aspects of 
the invention are drawn to host cells transformed with the DNA 
or antisense sequence of the invention, the use of such host cells 
for the maintenance, or expression or inhibition of expression of the 
DNA of the invention and to transgenic plants containing DNA of the 
invention. Finally, the invention also relates to the use of the GA4 
homologues to alter aspects of plant growth. 
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GA4 Homologue DNA, Protein and Methods of Use 



Field of the Invention 

The invention relates to the field of molecular biology and plant growth 
hormones, and especially to gibberellin synthesis. 

Background of the Invention \ 

Gibberellins (GA) are a large family of tetracyclic triterpenoid plant 
growth hormones that promote various growth and developmental processes in 
higher plants. These processes include promotion of cell division and extension, 
seed germination, stem elongation, flowering and fruiting (Stowe, B.B. et al, 
Annu. Rev. Plant Physiol. 5:181-216 (1957), Graebe, J.E. Annu. Rev. Plant 
Physiol 35,419-465 (1987), Phillips et al, Plant. Physiol. 705:1049-1057(1995), 
Xu et al., Proc. Natl Acad.. Sci. USA 92:6640-6444 (1995), Martin et al, Plant 
200:159-166 (1996)). Genes that can alter GA biosynthesis or sensitivity have 
had an impact on the development of new plant species and on agriculture in 
general. 

A number of GA responsive dwarf mutants have been isolated from 
various plant species, such as maize, pea, and Arabidopsis (Phinney, B.O. et al, 
"Chemical Genetics and the Gibberellin Pathway" in Zea mays L. in Plant 
Growth Substance, ed.,P.F. Waering, New York: Academic (1982) pp. 101-110; 
Ingram, T.J. et al, Plant 760:455-463 (1984); Koornneef, M., Arabidopsis Inf. 
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Serv. 75:17-20.(1978)). The dwarf mutants of maize {dwarf 1, dwarf 2, dwarf 3, 
dwarf 5) have been used to characterize the maize GA biosynthesis pathway by 
determining specific steps leading to biologically important metabolites (Phinney, 
B.O. et al, "Chemical Genetics and the Gibberellin Pathway" in Zea mays L. in 
5 Plant Growth Substance, ed., P.F. Waering, New York: Academic (1982) pp. 

101-110;Fujioka,S.e/a/., PlantPhysiol 55:1367-1372(1988)). Similar studies 
have been done with the dwarf mutants from a pea (Pisum sativum L.) (Ingrain, 
T.J. et al, Plant 760:455-463 (1984)). GA deficient mutants have also been 
isolated from Arabidopsis (gal, ga2, ga3, ga4 9 ga5) (Koornneef, M., et al., 
10 Theor. Appl Genet 55:257-263 (1980)). The Arabidopsis ga4 mutant, induced 

by ethyl methanesulfonate (EMS) mutagenesis, is a germinating, GA responsive, 
semidwarf plant whose phenotype can be restored to wild type by repeated 
application of exogenous GA (Koornneef, M. etal, Theor Appl Genet. 55:257- 
263 (1980)). 

15 The GA4 gene encodes a P-hydroxylase in Arabidopsis thaliana. A 

mutant allele (ga4) blocks the conversion of 3-P-hydroxy GAs, reducing the 
endogenous levels of GA„ GA 8 and GA 4 and increasing the endogenous levels 
of GA 19 , GA 20 and GA 9 (Talon, M. et al, Proc. Natl Acad. Sci. USA 57:7983- 
7987 (1990)). The reduced levels of the 3-p-hydroxy GAs is the cause of the 

20 semidwarf phenotype of the ga4 mutant. It has been suggested that the pea le 

mutant also encodes an altered form of 3-p-hydroxylase (Ross, J.J. et al, Physiol 
Plant. 76: 1 73- 1 76 ( 1 989)). The pea deactivation mutant, sin, causes an elongated 
slender phenotype (Ross et al, Plant J. 7:512-523 (1995)). Thus, p-hydroxylase 
is clearly implicated in the process of plant growth. 

25 Homologues of the GA4 gene (GA4H) that encode GA4-homologue 

proteins (GA4H) are described in this application. Two specific homologues, 
GA4H1 and GA4H2 are exemplified. High levels of sequence homology 
between the GA4H1, GA4H2 and GA4 genes, as well as between the proteins 
encoded by these genes suggest that at least these two homologue proteins 

30 (GA4H1 and GA4H2) may have similar functions or catalyze similar reactions 
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in plants to that of GA4. Thus, the GA4H proteins should be useful for plant 
growth modulation. 

Summary of the Invention 

The invention provides genes involved in gibberellin biosynthesis from 
5 which one can express and obtain proteins useful for the regulation of plant 

growth. Additionally, the invention provides for new DNA probes useful for 
obtaining additional GA4 homologue genes and proteins. Lastly, this invention 
provides methods of regulating plant growth. 

The invention is first directed to GA4H DNA and proteins encoded by 
10 GA4H DNA. 

The invention is further directed to GA4H antisense DNA, and to the 
GA4H antisense RNA transcribed from it. 

The invention is further directed to vectors containing GA4H encoding 
DNA and to the expression of GA4H proteins encoded by GA4H DNA in a host 
15 cell. 

The invention is further directed to vectors containing GA4H antisense 
DNA and to the expression of GA4H antisense RNA by the GA4H antisense 
DNA in a host cell. 

The invention is further directed to host cells transformed with a GA4H 
20 encoding DNA of the invention, and to the use of such host cells for the 

maintenance of GA4H DNA or expression of a GA4H protein of the invention. 

The invention is further directed to host cells transformed with a GA4H 
antisense DNA of the invention, and to the use of such host cells for the 
maintenance of the GA4H DNA or expression of the GA4H antisense RNA of the 
25 invention, as inhibitors of the expression of endogenous GA4H. 

The invention is further directed to transgenic plants containing a GA4H- 
encoding or GA4H antisense DNA of the invention. 



The invention is further directed to a method for altering plant growth, 
using a GA4H encoding or GA4H antisense DNA of the invention 

The invention is further directed to a method for altering plant growth, 
using a recombinantly made GA4H protein of the invention. 

Preferably, each of the above embodiments is directed to GA4H1 or 
GA4H2 or the cDNA or genomic DNA encoding the GA4 homologues, as well 
as the antisense DNA of GA4H1 or GA4H2. 

Brief Description of the Drawings 

Figure 1: Sequence of the GA4 cDNA (Chiang, H.H., et ah, Plant Cell 
7:195-201 (1995)) (SEQ IDNos. 1, 2, 3 and 4). The figure shows the locations 
from which DNA probes were generated. The underlined nucleotides (Unique 
probes) (SEQ ID No. 3) indicate the region specific to the GA4 gene that was 
used as a probe. Probes (Homologous probes) (SEQ ID No. 4) generated from 
boxed nucleotides were used for isolation of the GA4 homologues. 

Figure 2A-2C: DNA gel blots of Arabidopsis genomic DNA. Figure 2 A 
shows a blot that was hybridized to probes derived from the homologous region 
of the GA4 gene (Figure 1) at low stringency (42°C). Figure 2B shows a blot that 
was hybridized at low stringency to probes derived from the unique region of the 
GA4 gene (Figure 1). Figure 2C shows a blot that was hybridized at high 
stringency to probes derived from p3-l, GA4H1 gene (Figure 3), DNA. DNA in 
lanes 1, 2 and 3 was digested with Hindlll, Bamlil, and EcoRl, respectively. The 
predicted size (in kilobase pairs; kbp) of the three major hybridizing bands are 
shown on the left. 

Figure 3: The restriction map of the genomic clone y pL VN103 (ATCC 
accession no. 98436; Deposited at the American Type Culture Collection, 10801 
University Boulevard, Manasas, VA 201 1 0-2209, U.S.A.) under the terms of the 



Budapest Treaty), containing two linked homologues of GA4. The plasmid 
pLVN103 contains the entire genomic insert from A3 but was cloned into 
pBSKS(+). Plasmid p3-l is a subclone of X3 and carries the 2.1 kb Hindlll 
fragment. This subclone contains most of the coding region of the GA4H1 gene. 
The region containing both GA4H1 and GA4H2 genes are shown in more detail 
on the bottom of the figure. The arrows indicate the direction of transcription of 
these genes. The line indicates the noncoding area, and rectangular boxes 
represent the coding region of the DNA. Abbreviations: B, BamHl; H, Hindlll. 

Figure 4A-4B: Physical mapping the GA4H1 and GA4H2 genes by 
anchoring to mapped YACs. PCR amplification of the GA4H1 (with GA-P2 
and GA-P6 primers) {See Figure 6) and GA4H2 (with GA-P19 and GA-P20 
primers) {See Figure 8) genes (For the primer sequences, see Example 1). Figure 
4A shows an ethidium bromide stained gel of the PCR product. Figure 4B shows 
an Autoradiograph of a DNA blot of the gel in Figure 4A using probes derived 
from the genomic clone pLVN103. Primers GA-P19 and GA-P20 were used in 
lanes 1-2 and 4-6, while primers GA-P2 and GA-P6 were used in lanes 7-8 and 
10-12. Molecular weight markers (1 kb DNA ladder) were loaded in lanes 3 and 
9. DNA templates are: genomic clone pLVN103 (lanes 2 and 8); YAC CIC6C3 
(lanes 1 and 7) of chromosome 2; CIC1E4 (lanes 4 and 10); CIC6C10 (lanes 5 
and 1 1); and CIC10A1 1 (lanes 6 and 12). 

Figure 5: Nucleotide sequence (SEQID No. 5) of the GA4H1 RT-PCR 
product (cDNA). The predicted start (ATG) and stop (taa) codons are present at 
nucleotide nos. 44 and 1 109, respectively. The intron is located at nucleotide no. 
513 and is represented by a filled triangle (T). Underlined nucleotides indicate 
the start (ATG) and stop (taa) codons. Lower case nucleotides represent 5' and 
3' untranslated regions. A "G" at nucleotide no. 1059, indicated with an asterisk 
(*), does not agree with the genomic DNA at this position. The number on the left 
indicates the nucleotide position. 
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Figure 6: The genomic sequence of the GA4H1 gene (SEQ ID No. 6). 
Upper and lower case letters represent the coding and noncoding regions of the 
gene, respectively. The predicted translated protein sequence (SEQ ID No. 7) is 
shown below its corresponding nucleotide sequence. Arrows represent primers 
5 used in either PCR or RT-PCR analyses. The nucleotide and the amino acid 

positions are shown on the right. 

Figure 7: Nucleotide sequence of the GA4H2 RT-PCR product (cDNA) 
(SEQ ID No. 8). The predicted start (ATG) and stop (taa) codons are present at 
sequence nos. 49 and 1 190, respectively. The intron is located at sequence no. 
10 518. The number on the left indicates the nucleotide position. 

Figure 8: Genomic sequence of the GA4H2 gene (SEQ ID No. 9). 
Upper and lower case letters represent the coding and noncoding regions of the 
gene, respectively. The predicted translated protein sequence (SEQ ID No. 10) 
is shown below its corresponding nucleotide sequence. Arrows represent primers 
15 used in either PCR or RT-PCR analyses. The position of the nucleotide and the 

amino acid are shown on the right. 

Figure 9: Alignment of GA4, GA4H1 and GA4H2 proteins. Both Pileup 
and Prettybox (Genetics Computer Group, Wisconsin, MA, U.S.A.) commands 
were used to generate this alignment. The position of the amino acid is shown on 
20 the right. 

Figure 10: Amino acid sequence identity and similarity between GA4 
(SEQ ID No. 2), GA4H1 (SEQ ID No. 7), GA4H2 (SEQ ID No. 10) and some 
other related 2-oxoacid-dependent dioxygenases (2-ODD). The percentage of 
sequence identity and similarity (in parenthesis) were generated using the GAP 
25 software of the GCG package. Shaded boxes indicate the putative GA4 gene 

family \nArabidopsis. Abbreviations: GA5,Arabidopsis GA 20 -oxidase (accession 
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number X833 79); F3H, Zea maize flavanone-3-P-hydroxylase (accession number 
U04434); FLS, potato flavanol synthase (accession number X92 1 78); ANS, apple 
anthocyanidin hydroxylase (accession number S3 3 144); EFE, tobacco ethylene 
forming enzyme (accession number Z29529). Accession number refer to 
5 GENBANK. 

Figure 11A-11B: GA4H1 gene expressed in the flowers and shoot 
meristems. One-tenth of the PCR product of each sample was electrophoresed on 
an agarose gel and then stained with ethidium bromide (Figure 1 1 A). A DNA blot 
of the gel in Figure 1 1 A was probed with GA4H1 specific DNA (Figure 1 IB). 
10 Primers, GA-P1 3 and GA-P1 7, were used to amplify the 220 bp cDNA and 630 

bp genomic DNA of the GA4H1 gene. Primers Tua4F/ Tua4R were used as an 
internal control that amplified the 320 bp cDNA of the cc-tubulin 4 gene (TUA4). 
DNA templates of pLVN115 (lane 1), pCD7 (lane 2), and pLVN103 (lane 3) 
were used in the PCR amplification. First strand cDNA templates of floral shoots 
15 (lane 5), leaves (lane 6), roots (lane 7), and siliques (lane 8) were subjected to 

RT-PCR. The 123 bp BRL DNA marker is present in lane 4. 

Figure 12A-12B: GA4H2 gene expressed predominantly in the roots. 
One-tenth of the PCR product from each sample was separated on agarose gel and 
then stained with ethidium bromide (Figure 12A). The DNA gel blot shown in 
Figure 12A was probed with the GA4H2 specific probes (Figure 12B). Primers, 
GA-P18 and GA-P20, were used to amplify the 440 bp cDNA and 860 bp 
genomic DNA of the GA4H2 gene. The same primer pair of the TUA4 gene was 
also used as an internal control during the RT-PCR. RNA templates of siliques 
(lane 1), roots (lane 2), leaves (lane 3), and floral shoots (lane 4) were subjected 
to RT-PCR. DNA templates of pLVN 1 03 (lane 6), pCD7 (lane 7), and pL VN 1 07 
(lane 8) were used in the PCR amplification. The 123 bp BRL DNA marker is 
present in lane 5. 



20 



25 



Figure 13. Phenotype of transgenic plants expressing the sense and 
antisense of the GA4H1 gene. 

Definitions 

"GA n " (with a number subscripted), refers to the "gibberellin A n " 
compound. The chemical structures of some of the gibberellin A n 's are presented 
in Moritz, T. et aL, Plant 7P5:l-8 (1994). GA without a subscript, e.g. GA1 
refers to enzymes presumably involved in the gibberellin biosynthetic pathway. 

Italicized, uppercase names, such as "GA4 or GA4H } " refer to the wild 
type gene. Italicized, lowercase names such as "ga4" refer to the mutant gene. 

Uppercase names, such as "GA4H," refer to the protein, DNA or RNA 
encoded by a GA4H gene, while lowercase names, such as M ga4," refer to the 
protein, DNA or RNA encoded by a mutant, such as the mutant ga4 gene. 

GA4H refers to any GA4 homologue, while GA4H1 and GA4H2 refers 
to the homologues of GA4 shown in figures 6 and 8, or minor variations of these 
homologues or their cDNAs (Figures 5 and 7) . Such minor variations may 
include, but are not limited to substitution of conservative amino acids or 
degenerate substitutions in the DNA encoding the amino acid sequence of 
GA4H1 and GA4H2. Such variation may also be referred to as "substantially 
similar" molecules. 

A unique probe should be understood to be a probe that contains a DNA 
sequence unique to GA4 DNA and that can be used to pull out the GA4 DNA. A 
"unique" probe sequence is indicated in Figure 1 by underlining. A homologue 
probe contains a DNA sequence homologous to a sequence found in GA4 
homologue DNA. A "homologous" probe sequence is indicated in Figure 1 by 
the boxed nucleotide sequence and can be used to obtain GA4H DNA. 

Plant should be understood as referring to a multicellular differentiated 
organism capable of photosynthesis including angiosperms (monocots and dicots) 
and gymnosperms. 
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Plant cell should be understood as referring to the structural and 
physiological unit of plants. The term "plant cell" refers to any ceil which is 
either part of or derived from a plant. Some examples of cells encompassed by 
the present invention include differentiated cells that are part of a living plant; 
5 differentiated cells in culture; undifferentiated cells in culture; the cells of 

undifferentiated tissue such as callus or tumors. 

Plant cell progeny should be understood as referring to any cell or tissue 
derived from plant cells including callus; plant parts such as stems, roots, fruits, 
leaves or flowers; plants; plant seed; pollen; and plant embryos. 

10 Propagules should be understood as referring to any plant material 

capable of being sexually or asexually propagated, or being propagated in vivo or 
in vitro. Such propagules preferably consist of the protoplasts, cells, calli, tissues, 
embryos or seeds of the regenerated plants. 

Transgenic plant should be understood as referring to a plant having 

15 stably incorporated exogenous DNA (i.e. DNA not normally found) in its genetic 

material. The term also includes exogenous DNA which may be introduced into 
a cell or protoplast in various forms, including, for example, naked DNA in 
circular, linear or supercoiled form, DNA contained in nucleosomes or 
chromosomes or nuclei or parts thereof, DNA complexed or associated with other 

20 molecules, DNA enclosed in liposomes, spheroplasts, cells or protoplasts. 

Purified as it refers to preparations made from biological cells or hosts 
should be understood to mean any cell extract containing the indicated DNA or 
protein including a crude extract of the DNA or protein of interest. For example, 
in the case of a protein, a purified preparation can be obtained following an 

25 individual technique or a series of preparative or biochemical techniques and the 

DNA or protein of interest can be present at various degrees of purity in these 
preparations. The procedures may include for example, but are not limited to, 
ammonium sulfate fractionation, gel filtration, ion exchange change 
chromatography, affinity chromatography, density gradient centrifugation and 

30 electrophoresis. 
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A preparation of DNA or protein that is "pure" or "isolated" should be 
understood to mean a preparation free from naturally occurring materials with 
which such DNA or protein is normally associated in nature. "Essentially pure" 
should be understood to mean a "highly" purified preparation that contains at 
least 95% of the DNA or protein of interest. 

A cell extract that contains the DNA or protein of interest should be 
understood to mean a homogenate preparation or cell-free preparation obtained 
from cells that express the protein or contain the DNA of interest. The term "cell 
extract" is intended to include culture media, especially spent culture media from 
which the cells have been removed. 

A fragment of a molecule should be understood as referring to a shortened 
sequence of an amino acid or nucleotide sequence that retains one or more desired 
chemical or biological properties of the full-length sequence such that use of the 
full-length sequence. 

A functional derivative of GA4H (or GA4) should be understood as 
referring to a protein, or DNA encoding a protein, that possesses a biological 
activity that is substantially similar to the biological activity of GA4H (or GA4). 
A functional derivative may or may not contain post-translational modifications 
such as covalently linked carbohydrate, depending on the necessity of such 
modifications for the performance of a specific function. The term "functional 
derivative" is intended to include the "fragments," "variants," "analogues," or 
"chemical derivatives" of a molecule. The derivative retains at least one of the 
naturally-occurring functions of the parent gene or protein. The function can 
be any of the regulatory gene functions or any of the function(s) of the finally 
processed protein. The degree of activity of the function need not be 
quantitatively identical as long as the qualitative function is substantially 
similar. 

A mutation should be understood as referring to a detectable change in the 
genetic material which may be transmitted to daughter cells and possibly even to 
succeeding generations giving rise to mutant cells or mutant organisms. If the 



descendants of a mutant cell give rise only to somatic cells in multicellular 
organisms, a mutant spot or area of cells arises. Mutations in the germ line of 
sexually reproducing organisms may be transmitted by the gametes to the next 
generation resulting in an individual with the new mutant condition in both its 
somatic and germ cells. A mutation may be any (or a combination of) detectable, 
unnatural change affecting the chemical or physical constitution, mutability, 
replication, phenotypic function, or recombination of one or more 
deoxyribonucleotides; nucleotides may be added, deleted, substituted for, 
inverted, or transposed to new positions with and without inversion. Mutations 
may occur spontaneously and can be induced experimentally by application of 
mutagens. A mutant variation of a nucleic acid molecule results from a mutation. 
A mutant polypeptide may result from a mutant nucleic acid molecule. 

A species should be understood as referring to a group of actually or 
potentially interbreeding natural populations. A species variation within a nucleic 
acid molecule or protein is a change in the nucleic acid or amino acid sequence 
that occurs among species and may be determined by DNA sequencing of the 
molecule in question. 

A preparation that is substantially free of other A, thaliana DNA (or 
protein) should be understood as referring to a preparation wherein the only A. 
thaliana DNA (or protein) is that of the recited A. thaliana DNA (or protein). 
Though proteins may be present in the sample which are homologous to other A 
thaliana proteins, the sample is still said to be substantially free of such other A 
thaliana DNA (or protein) as long as the homologous proteins contained in the 
sample are not expressed from genes obtained from A. thaliana. 

A DNA construct should be understood as referring to a recombinant, 
man-made DNA, linear or circular. 

T-DNA (transferred DNA) should be understood as referring to a segment 
or fragment of Ti (tumor-inducing) plasmid DNA which integrates into the plant 
nuclear DNA. 



-12- 



Stringent hybridization conditions should be understood to be those 
conditions normally used by one of skill in the art to establish at least a 90% 
homology between complementary pieces of DNA or DNA and RNA. Lesser 
homologies, such as at least 70% homology or preferably at least 80% may also 
be desired and obtained by varying the hybridization conditions. 

There are only three requirements for hybridization to a denatured strand 
of DNA to occur. (1) There must be complementary single strands in the sample. 
(2) The ionic strength of the solution of single-stranded DNA must be fairly high 
so that the bases can approach one another; operationally, this means greater than 
0.2M. (3) The DNA concentration must be high enough for intermolecular 
collisions to occur at a reasonable frequency. The third condition only affects the 
rate, not whether renaturati on/hybridization will occur. 

Conditions routinely used by those of skill in the art are set out in readily 
available procedure texts, e.g., Current Protocols in Molecular Biology, Vol. I, 
Chap. 2. 1 0, John Wiley & Sons, Publishers (1 994) or Sambrook et al , Molecular 
Cloning, Cold Spring Harbor (1 989), incorporated herein by reference. As would 
be known by one of skill in the art, the ultimate hybridization stringency reflects 
both the actual hybridization conditions as well as the washing conditions 
following the hybridization, and one of skill in the art would know the 
appropriate manner in which to change these conditions to obtain a desired result. 

For example, a prehybridization solution should contain sufficient salt and 
nonspecific DNA to allow for hybridization to non-specific sites on the solid 
matrix, at the desired temperature and in the desired prehybridization time. For 
example, for stringent hybridization, such prehybridization solution could contain 
6x sodium chloride/sodium citrate (IxSSC is 0.15 M NaCl, 0.015 M Na citrate; 
pH 7.0), 5x Denhardt's solution, 0.05% sodium pyrophosphate and 1 00 ^g per ml 
of herring sperm DNA. An appropriate stringent hybridization mixture might 
then contain 6x SSC, lx Denhardt's solution, 100 |ig per ml of yeast tRNA and 
0.05% sodium pyrophosphate. 

Alternative conditions for DNA-DNA analysis could entail the following: 
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1 ) prehybridization at room temperature and hybridization at 68 °C; 

2) washing with 0.2x SSC/0. 1 % SDS at room temperature; 

3) as desired, additional washes at 0.2x SSC/0. 1% SDS at 42 °C 
(moderate-stringency wash); or 

4) as desired, additional washes at O.lx SSC/0. 1% SDS at 68 °C 
(high stringency). 

Known hybridization mixtures, e.g., that of Church and Gilbert, Proc. 
Natl. Acad.. Sci. USA 57:1991-1995 (1984), comprising the following 
composition may also be used: 1 % crystalline grade bovine serum albumin/1 mM 
EDTA/0.5M NaHP0 4 , pH 7.2/7% SDS. Additional, alternative but similar 
reaction conditions can also be found in Sambrook et ai, Molecular Cloning, 
Cold Spring Harbor (1989). Formamide may also be included in 
prehybridization/hybridization solutions as desired. 

It should be understood that these conditions are not meant to be definitive 
or limiting and may be adjusted as required by those of ordinary skill in the art to 
accomplish the desired objective. 

A vector should be understood to be a DNA element used as a vehicle for 
cloning or expressing a desired sequence, such as a gene of the invention, in a 
host. 

A host or host cell should be understood to be a cell in which a 
recombinant sequence, such as a sequence encoding a GA4H DNA of the 
invention, is incorporated and expressed. A GA4H gene of the invention or the 
antisense of the gene may be introduced into a host cell as part of a vector by 
transformation. Both the sense and the antisense DNA sequences are present in 
the same host cell since DNA is double stranded. The direction of transcription, 
however, as directed by an operably linked promoter as designed by the artisan, 
dictates which of the two strands is ultimately transcribed into RNA. 



Detailed Description 

The process for genetically engineering GA4H protein sequences, 
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according to the invention, is facilitated through the cloning of genetic sequences 
that are capable of encoding GA4H proteins and through the expression of such 
genetic sequences. As used herein, the term "genetic sequence" is intended to 
refer to a nucleic acid sequence (preferably DNA). Genetic sequences that are 
capable of encoding GA4H proteins can be derived from a variety of sources. 
These sources include genomic DNA, RNA, cDNA, synthetic DNA, and 
combinations thereof. The preferred source of the GA4H genomic DNA is a 
plant genomic library and most preferably an Arabidopsis genomic library. A 
more preferred source of the GA4H cDNA is a plant cDNA library and most 
preferably an Arabidopsis cDNA library made from silique mRNA, although the 
message is ubiquitously expressed in the root, leaf and flower of plants. This 
invention, however, is not meant to be limited to GA4H homologues from only 
the plant genus Arabidopsis. 

Methods for obtaining and screening genomic libraries are well known in 
the art. An example of obtaining and screening a genomic library which is not 
meant to be limiting follows. Additional methods may be found in Example 1 of 
the specification. 

One may begin with a CsCl DNA preparation and partially digests it with 
Sau3AI. After digestion, a partial fill-in reaction is performed. The reaction 
mixture for the partial fill-in is as follows: 

40^1 DNA 

6 jil Sau3AI buffer (10X) 
2.5 [il 0.1MDTT 
1 jil lOOmMdATP 
1 |il lOOmMdGTP 
5 |il Klenow enzyme 
4.5 Ml H 2 0 

After 30 minutes at 37 °C the reaction is terminated with phenol-chloroform and 
the DNA is obtained. The DNA is then loaded on a 0.7% low melting point 
agarose gel and after electrophoresing, bands between 10 and 23 kb are cut out 
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from the gel. The gel with the cut-out bands is then melted at 67 °C. The isolated 
DNA is then placed in the following ligation mixture: 

2 fil Lambda Fix II, pre-digested arms (2 fig) 
1 fig genomic DNA, partial fill-in 
5 0.5 ul lOx ligation buffer 

0.5 ul 1 OmM ATP (pH 7.05) 

0.5 fil T4 DNA ligase 

~ 1 .5 (il H 2 0 (to 5|il final volume) 

Following ligation overnight at 4°C, the DNA is packaged using GIGAPACK II 
10 GOLD. 

Plaque lifts are made using Hybond filters (Amersham Corp.), which were 
then autoclaved for 2 min. Filters were hybridized with probes as described for 
DNA and RNA gel blot analysis below. 

Bacteriophage A. DNA is prepared from ER1458 lysates according to the 
15 mini-prep method of Grossberger, D., Nucl. Acids. Res. 1 5 :6737 (1 987). DNA 

fragments are subcloned into pBluescript KS" vectors (Stratagene) and used to 
transform JM109. 

Double stranded DNA is isolated from plasmid clones and purified by 
CsCl banding. Sequencing is performed using the ABI PRISM dye terminator 
20 cycle sequencing kit and the products are separated and detected on the ABI 377 

(Perkin Elmer). Sequence analysis is performed using the Sequence Analysis 
Software package (Genetics Computer Group, Inc., Madison, WI) and the Blast 
network service of the National Center for Biotechnology Information (Bethesda, 
MD). 

25 Electrophoresis of DNA is in Tris-Acetate-EDTA buffer with subsequent 

transfer in 25 mM NaHP0 4 to Biotrans filters (International Chemical and 
Nuclear Corp.). Electrophoresis of RNA samples is in agarose gels containing 
RNAase inhibitor using MOPS/EDTA buffer and transferred to filters as for 
DNA. Filters were UV-crosslinked using a Stratalinker (Stratagene) and baked 

30 for 1 hrat80°C. 
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Radioactive probes are separated from unincorporated nucleotides using 
a 1-ml Sephadex G-50 spin column and denatured in a microwave oven (Stroop, 
W.G. et al., Anal. Biochem. 182:222-225 (1989)). Prehybridization for 1 hr and 
hybridization overnight is performed at 65 °C in the hybridization buffer 
described by Church, G.M. et al, Proc. Natl. Acad.. Sci. USA 81:1991-1995 
(1 984)). Filters are washed once for 1 5 min in 2xSSC at room temperature, then 
two times for 30 min in O.lxSSC and 0.1%SDS at 60°C. The damp filters are 
autoradiographed at -80 °C using intensifying screens. Filters are stripped twice 
in 2mM Tris-HCl, pH8.0, ImM EDTA, 0.2% SDS at 70 °C for 30 min prior to 
reprobing (Church, G.M. et al., Proc. Natl. Acad.. Sci. USA 81:1991-1995 
(1984)). 

The recombinant G A4H cDNA of the invention will not include naturally 
occurring introns if the cDNA is made using mature GA4H mRNA as a template. 
Genomic DNA may or may not include naturally occurring introns. Moreover, 
such genomic DNA may be obtained in association with the homologous (isolated 
from the same source; native) 5' promoter region of the GA4H gene sequences 
and/or with the homologous 3 ' transcriptional termination region. Further, such 
genomic DNA may be obtained in association with the genetic sequences that 
provide the homologous 5 ' non-translated region of the GA4 mRNA and/or with 
the genetic sequences which provide the homologous 3 ' non-translated region. 

Due to the degeneracy of nucleotide coding sequences, and to the fact that 
the DNA code is known, all other DNA sequences which encode the same amino 
acid sequence as depicted for example, in Figure 6 [SEQ ID No. 7] can be 
determined and used in the practice of the present invention. Additionally, those 
sequences that hybridize to for example, to a GA4H sequence such as SEQ. ID 
Nos. 5 or 6, under stringent conditions are also useful in the practice of the 
present invention. 

A DNA sequence encoding GA4H protein or GA4H antisense RNA can 
be inserted into a DNA vector in accordance with conventional techniques, 
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including blunt-ending or staggered-ending termini for ligation, restriction 
enzyme digestion to provide appropriate termini, filling in of cohesive ends as 
appropriate, alkaline phosphatase treatment to avoid undesirable joining, and 
ligation with appropriate ligases. In one embodiment of the invention, expression 
vectors are provided that are capable of expressing GA4H mRNA or antisense 
RNA. Vectors for propagating a given sequence in a variety of host systems are 
well known and can readily be altered by one of skill in the art such that the 
vector will contain DNA or RNA encoding the desired genetic sequence and will 
be propagated in a desired host. Such vectors include plasmids and viruses and 
such hosts include eukaryotic organisms and cells, for example plant, yeast, 
insect, plant, mouse or human cells, and prokaryotic organisms, for example E. 
coli and B. subtilus. Shuttle vectors in which the desired genetic sequence is 
"maintained" in an available form before being extracted and transformed into a 
second host for expression are also useful DNA constructs envisioned as carrying 
the DNA of the invention. 

A nucleic acid molecule, such as DNA, is said to be "capable of 
expressing" a polypeptide or antisense sequence if it contains a nucleotide 
sequence that encodes such polypeptide or antisense sequence and transcriptional 
and, if necessary, translational regulatory information operably linked to the 
nucleotide sequences that encode the polypeptide or antisense sequence. 

Two DNA sequences (such as a promoter region sequence and the GA4H 
gene encoding or antisense sequence) are said to be operably linked if the nature 
of the linkage between the two DNA sequences does not (1) result in the 
introduction of a frame-shift mutation, (2) interfere with the ability of the 
promoter region sequence to direct the transcription of the desired sequence, or 
(3) interfere with the ability of the desired sequence to be transcribed by the 
promoter region sequence. Thus, a promoter region would be operably linked to 
a desired DNA sequence if the promoter were capable of effecting transcription 
of that DNA sequence. 
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In one embodiment of the invention, a vector is employed that is capable 
of integrating the desired gene sequences into the host cell chromosome. Cells 
that have stably integrated the introduced DNA into their chromosomes can be 
selected by also introducing one or more markers which allow for selection of 
host cells which contain the expression vector. The marker may provide for 
prototrophy to an auxotrophic host, biocide resistance, e.g., antibiotics, or heavy 
metals, such as copper, or the like. The selectable marker gene sequence can 
either be directly linked to the DNA gene sequences to be expressed, or intro- 
duced into the same cell by co-transfection. 

In another embodiment, the introduced sequence will be incorporated into 
a plasmid or viral vector capable of autonomous replication in the recipient host. 
Any of a wide variety of vectors may be employed for this purpose. Factors of 
importance in selecting a particular plasmid or viral vector include: the ease with 
which recipient cells that contain the vector may be recognized and selected from 
those recipient cells which do not contain the vector; the number of copies of the 
vector which are desired in a particular host; and whether it is desirable to be able 
to "shuttle" the vector between host cells of different species. 

The present invention also encompasses the expression of the GA4H 
protein (or a functional derivative thereof) in either prokaryotic or eukaryotic 
cells. Preferred prokaryotic hosts include bacteria such as E. coli, Bacillus, 
Streptomyces, Pseudomonas, Salmonella, Serratia, etc. The most preferred 
prokaryotic host is E. coli. Bacterial hosts of particular interest include E. coli 
K12 strain 294 (ATCC 3 1446), E. coli %1 776 (ATCC 3 1 537), R coli W3 1 1 0 (F\ 
lambda , prototrophic (ATCC 27325)), and other enterobacterium such as 
Salmonella typhimurium or Serratia marcescens, and various Pseudomonas 
species. Under such conditions, the GA4H gene product will not be glycosylated. 
The procaryotic host must be compatible with the replicon and control sequences 
in the expression plasmid. 

Hosts can be utilized for production of the desired genetic sequence, or 
GA4H protein, using conventional methods, such as by growth in shake flasks, 
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fermentors, tissue culture plates or bottles. Alternatively, multicellular organisms 
such as a plant might be used. 

DNA encoding the desired protein is preferably operably linked to a 
promoter region, a transcription initiation site, and a transcription termination 
sequence, functional in plants. Any of a number of promoters which direct 
transcription in a plant cell is suitable. The promoter can be either constitutive 
or inducible. Some examples of promoters functional in plants include the 
nopaline synthase promoter and other promoters derived from native Ti plasmids, 
viral promoters including the 35S and 1 9S RNA promoters of cauliflower mosaic 
virus (Odell et al, Nature 375:810-812 (1985)), and numerous plant promoters. 

Alternative promoters that may be used include nos, ocs, and CaMV 
promoters. Overproducing plant promoters may also be used. Such promoters, 
operably linked to the GA4H gene, should increase the expression of the GA4 
protein. Overproducing plant promoters that may be used in this invention 
include the promoter of the small subunit (ss) of ribulose-l,5-biphosphate 
carboxylase from soybean (Berry-Lowe et al , 1 Molecular and App. Gen. 7:483- 
498 (1 982), and the promoter of the chlorophyll a^ binding protein. These two 
promoters are known to be light-induced in eukaryotic plant cells (see, for 
example, Genetic Engineering of Plants, an Agricultural Perspective, A. 
Cashmore, Plenum, New York 1983, pages 29-38; Corruzi, G. et alj. of Biol 
Chem. 255:1399 (1983); and Dunsmuir, P. et al. 9 J. ofMol and Applied Genet 
2:285 (1983)). 

To express the GA4H gene (or a functional derivative thereof) in a 
prokaryotic cell (such as, for example, E. coli, B. subtilis, Pseudomonas, 
Streptomyces, etc.), it is necessary to operably link the GA4H gene encoding 
sequence to a functional prokaryotic promoter. Such promoters may be either 
constitutive or, more preferably, regulatable (i.e., inducible or derepressible). 
Examples of constitutive promoters include the int promoter of bacteriophage A, 
the bla promoter of the P-lactamase gene sequence of pBR322, and the CAT 
promoter of the chloramphenicol acetyl transferase gene sequence of pBR325, 
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etc. Examples of inducible prokaryotic promoters include the major right and left 
promoters of bacteriophage X (P L and P R ), the trp, recA, lacZ, lad, and gal 
promoters of E. co//,thea-amylase(Ulmanen,I.,e/a/., J. Bacteriol 762:176-182 
(1985)) and the g-28-specific promoters of B. subtilis (Gilman, M.Z., et ai, Gene 
sequence 32:11-20 (1984)), the promoters of the bacteriophages of Bacillus 
(Gryczan, T.J., In: The Molecular Biology of the Bacilli, Academic Press, Inc., 
NY (1982)), and Streptomyces promoters (Ward, J.M., etal., Mol Gen. Genet. 
205:468-478 (1986)). 

Prokaryotic promoters are reviewed by Glick, B.R., (J. Ind Microbiol. 
7:277-282 (1987)); Cenatiempo, Y. (Biochimie 65:505-516 (1986)); and 
Gottesman, S. (Ann. Rev. Genet. 75:415-442 (1984)). 

Proper expression in a prokaryotic cell also requires the presence of a 
ribosome binding site upstream of the gene sequence-encoding sequence. Such 
ribosome binding sites are disclosed, for example, by Gold, L., et al (Ann. Rev. 
Microbiol. 55:365-404 (1981)). 

Preferred eukaryotic hosts include yeast, fungi, insect cells, mammalian 
cells either in vivo, or in tissue culture. Mammalian cells that can be useful as 
hosts include cells of fibroblast origin such as VERO or CHO-K1, or cells of 
lymphoid origin, such as the hybridoma SP2/0-AG1 4 or the myeloma P3x63Sg8, 
and their derivatives. Preferred mammalian host cells include SP2/0 and J558L, 
as well as neuroblastoma cell lines such as IMR 332 that may provide better 
capacities for correct post-translational processing. 

For a mammalian host, several possible vector systems are available for 
the expression of the GA4H gene. A wide variety of transcriptional and 
translational regulatory sequences may be employed, depending upon the nature 
of the host. The transcriptional and translational regulatory signals may be 
derived from viral sources, such as adenovirus, bovine papilloma virus, Simian 
virus, or the like, where the regulatory signals are associated with a particular 
gene sequence which has a high level of expression. Alternatively, promoters 
from mammalian expression products, such as actin, collagen, myosin, etc., may 
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be employed. Transcriptional initiation regulatory signals may be selected that 
allow for repression or activation, so that expression of the gene sequences can 
be modulated. Of interest are regulatory signals which are temperature-sensitive 
so that by varying the temperature, expression can be repressed or initiated, or are 
5 subject to chemical (such as metabolite) regulation. 

Yeast provides substantial advantages in that it can also carry out post- 
translational peptide modifications. A number of recombinant DNA strategies 
exist that utilize strong promoter sequences and high copy number plasmids that 
can be utilized for production of the desired proteins in yeast. Yeast recognizes 
10 leader sequences on cloned mammalian gene sequence products and secretes 

peptides bearing leader sequences (i.e., pre-peptides). 

Any of a series of yeast gene sequence expression systems incorporating 
promoter and termination elements from the actively expressed gene sequences 
coding for glycolytic enzymes produced in large quantities when yeast are grown 
15 in medium rich in glucose can be utilized. Known glycolytic gene sequences can 

also provide very efficient transcriptional control signals. For example, the 
promoter and terminator signals of the phosphoglycerate kinase gene sequence 
can be utilized. 

Another preferred host is insect cells, for example the Drosophila larvae. 

20 Using insect cells as hosts, the Drosophila alcohol dehydrogenase promoter can 

be used (Rubin, G.M., Science 240:1453-1459 (1988)). Alternatively, 
baculo virus vectors can be engineered to express large amounts of the GA1 gene 
in insects cells (Jasny, B.R., Science 235:1653 (1987); Miller, D.W., et al % in 
Genetic Engineering (1986), Setlow, J.K., et ai, eds., Plenum, Vol. 8, pp. 277- 

25 297). 

As discussed above, expression of the GA4H gene in eukaryotic hosts 
requires the use of eukaryotic regulatory regions. Such regions will, in general, 
include a promoter region sufficient to direct the initiation of RNA synthesis. 
Preferred eukaryotic promoters include the promoter of the mouse 
30 metallothionine I gene sequence (Hamer, D., et al , J. Mol Appl Gen. 1 :273-288 
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(1982)); the TK promoter of Herpes virus (McKnight, S., Cell 57:355-365 
(1982)); the SV40 early promoter (Benoist, C, et al, Nature (London) 290:304- 
310(1981)); the yeast gal4 gene sequence promoter (Johnston, S.A., etal, Proc. 
Natl Acad.. Set (USA) 79:6971-6975 (1982); Silver, P.A., et al t Proc. Natl 
Acad. Sci. (USA) 57:5951-5955 (1984)). 

As is widely known, translation of eukaryotic mRNA is initiated at the 
codon that encodes the first methionine. For this reason, it is preferable to ensure 
that the linkage between a eukaryotic promoter and a DN A sequence that encodes 
the GA4H gene (or a functional derivative thereof) does not contain any 
intervening codons that are capable of encoding a methionine (i.e., AUG). The 
presence of such codons results either in the formation of a fusion protein (if the 
AUG codon is in the same reading frame as the GA4H gene encoding DNA 
sequence) or a frame-shift mutation (if the AUG codon is not in the same reading 
frame as the GA1 gene encoding sequence). 

The GA 4H gene encoding sequence and an operably linked promoter may 
be introduced into a recipient prokaryotic or eukaryotic cell either as a non- 
replicating DNA (or RNA) molecule, which may either be a linear molecule or, 
more preferably, a closed covalent circular molecule. Since such molecules are 
incapable of autonomous replication, the expression of the GA4H gene may occur 
through the transient expression of the introduced sequence. Alternatively, 
permanent expression may occur through the integration of the introduced 
sequence into the host chromosome. 

In one embodiment, a vector is employed that is capable of integrating the 
desired gene sequences into the host cell chromosome. Cells that have stably 
integrated the introduced DNA into their chromosomes can be selected by also 
introducing one or more markers that allow for selection of host cells which 
contain the expression vector, the marker can provide for prototrophy to an 
auxotrophic host, biocide resistance, e.g., antibiotics, or heavy metals, such as 
copper, or the like. The selectable marker gene sequence can either be directly 
linked to the DNA gene sequences to be expressed, or introduced into the same 
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cell by co-transfection. Additional elements can also be needed for optimal 
synthesis of single chain binding protein mRNA. These elements can include 
splice signals, as well as transcription promoters, enhancers, and termination 
signals. cDNA expression vectors incorporating such elements include those 
5 described by Okayama, H., Molec. Cell Biol. 3:280 (1 983). In a preferred 

embodiment, the introduced sequence is incorporated into a plasmid or viral 
vector capable of autonomous replication in the recipient host. Any of a wide 
variety of vectors can be employed for this purpose. Factors of importance in 
selecting a particular plasmid or viral vector include: the ease with which 

10 recipient cells that contain the vector can be recognized and selected from those 

recipient cells that do not contain the vector; the number of copies of the vector 
that are desired in a particular host; and whether it is desirable to be able to 
"shuttle" the vector between host cells of different species. Preferred prokaryotic 
vectors include plasmids such as those capable of replication in E. coli (such as, 

15 for example, pBR322, ColEl, pSClOl, pACYC 184, tcVX. Such plasmids are, 

for example, disclosed by Maniatis, T., et al. (In: Molecular Cloning, A 
Laboratory Manual, Cold Spring Harbor Press, Cold Spring Harbor, NY (1 982)). 
Bacillus plasmids include pC194, pC221, pT127, etc. Such plasmids are 
disclosed by Gryczan, T. (In: The Molecular Biology of the Bacilli, Academic 

20 Press, NY (1982), pp. 307-329). Suitable Streptomyces plasmids include pIJl 01 

(Kendall, K.J., et al, J. Bacteriol 7tfP:4177-4183 (1987)), and streptomyces 
bacteriophages such as <|>C3 1 (Chater, K.F., et al, In: Sixth International 
Symposium on Actinomycetales Biology, Akademiai Kaido, Budapest, Hungary 
(1986), pp. 45-54). Pseudomonas plasmids are reviewed by John, J.F., et al 

25 (Rev. Infect. Dis. 5:693-704 ( 1 986)), and Izaki, K. (Jpn. J. Bacteriol. 33:129-142 

(1978)). 

Preferred eukaryotic plasmids include BPV, vaccinia, SV40, 2-micron 
circle, etc., or their derivatives. Such plasmids are well known in the art (Bot- 
stein, D., et al, Miami Wntr. Symp. 19:265-21 A (1982); Broach, J.R., In: The 
30 Molecular Biology of the Yeast Saccharomyces: Life Cycle and Inheritance, Cold 
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Spring Harbor Laboratory, Cold Spring Harbor, NY, p. 445-470 (1981); Broach, 
J.R., Cell 25:203-204 (1982); Bollon, D.P., etaL, J. Clin. Hematol Oncol 70:39- 
48 (1980); Maniatis, T., In: Cell Biology: A Comprehensive Treatise, Vol. 3, 
Gene sequence Expression, Academic Press, NY, pp. 563-608 (1980)). 

Once the vector or DNA sequence containing the construct(s) has been 
prepared for expression, the DNA construct(s) may be introduced into an 
appropriate host cell by any of a variety of suitable means: transformation, 
transfection, conjugation, protoplast fusion, electroporation, calcium phosphate- 
precipitation, direct microinjection, etc. After the introduction of the vector, 
recipient cells are grown in a selective medium, which selects for the growth of 
vector-containing cells. Expression of the cloned gene sequence(s) results in the 
production of the GA4H gene, or fragments thereof. This can take place in the 
transformed cells as such, or following the induction of these cells to differentiate 
(for example, by administration of bromodeoxyuracil to neuroblastoma cells or 
the like). 

Following expression in an appropriate host, the GA4H protein can be 
readily isolated using standard techniques such as immunochromatography or 
HPLC to produce GA4H protein free of other A. thaliana proteins. 

Genetic sequences comprising the desired gene or antisense sequence 
operably linked to a plant promoter may be joined to secretion signal sequences 
and the construct ligated into a suitable cloning vector. In general, plasmid or 
viral (bacteriophage) vectors containing replication and control sequences derived 
from species compatible with the host cell are used. The cloning vector will 
typically carry a replication origin, as well as specific genes that are capable of 
providing phenotypic selection markers in transformed host cells, typically anti- 
biotic resistance genes. 

General methods for selecting transgenic plant cells containing a 
selectable marker are well known and taught, for example, by Herrera-Estrella, 
L. and Simpson, J. (1988) "Foreign Gene Expression in Plants 1 ' in Plant 
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Molecular Biology, A Practical Approach, Ed. C.H. Shaw, IRL Press, Oxford, 
England, pp. 131-160. 

In another embodiment, the present invention relates to a transformed 
plant cell comprising exogenous copies of DNA (that is, copies that originated 
outside of the plant) encoding a GA4 gene expressible in the plant cell wherein 
said plant cell is free of other foreign marker genes (preferably, other foreign 
selectable marker genes); a plant regenerated from the plant cell; progeny or a 
propagule of the plant; and seed produced by the progeny. 

Plant transformation techniques are well known in the art and include 
direct transformation (which includes, but is not limited to: microinjection 
(Crossway, Mol. Gen. Genetics 202.179-185 (1985)), polyethylene glycol 
transformation (Krens et al, Nature 296:12-14 (1982)), high velocity ballistic 
penetration (Klein et al, Nature 327:10-13 (1987)), fusion of protoplasts with 
other entities, either minicells, cells, lysosomes, or other fusible lipid-surfaced 
bodies (Fraley et al., Proc. Natl. Acad.. Sci. USA 79:1859-1863 (1982)), 
electroporation (Fromm et al., Proc. Natl. Acad. Sci. USA 82:5824 (1985)) and 
techniques set forth in U.S. Patent No. 5,231,019)) and Agrobacterium 
tumefaciens mediated transformation as described herein and in (Hoekema et al, 
Nature 303:119 (1983), de Framond et al, Bio/technology 1:262 (1983), Fraley 
et al WO84/02913, WO84/02919 and WO84/02920, Zambryski et al. EP 
1 16,718, Jordan et al, Plant Cell Reports 7:281-284 (1988), Leple et al. Plant 
Cell Reports 77.137-141 (1992), Stomp et al, Plant Physiol. 92:1226-1232 
( 1990), and Knaufe/fl/., Plasmid 8:45-54 { 1982), Chiang et al, PlantCell 7:195- 
201 (1995)). Another method of transformation is the leaf disc transformation 
technique as described by Horsch et al. Science 22 7: 1 229- 1230(1 985), Bechtold 
et al, Acad. Sci. Paris 57(5:1 194-1 199 (1993). 

The transformation techniques can utilize DNA encoding a GA4H amino 
acid sequence of, including the GA4H cDNA sequence, the GA4H genomic 
sequence, fragments thereof or the antisense sequence, or degenerate variants of 
said sequences such that they are expressible in plants. Included within the scope 
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of a gene encoding a GA4H amino acid sequence are functional derivatives of the 
GA4H sequences of the invention, as well as variant, analog, species, allelic and 
mutational derivatives. 

The preparation of functional derivatives can be achieved, for example, 
by site-directed mutagenesis. (Sambrook et al , Molecular Cloning: A Laboratory 
Manual, Cold Spring Harbor Press ( 1 989)). Site-directed mutagenesis allows the 
production of a functional derivative through the use of a specific oligonucleotide 
that contains the desired mutated DNA sequence. One skilled in the art will 
recognize that the functionality of the derivative can be evaluated by routine 
screening assays. 

As used herein, modulation of GA4H expression entails the enhancement 
or reduction of the naturally occurring levels of the protein. Specifically, the 
translation of RNA encoding GA4H can be reduced using the technique of 
antisense cloning. 

In general, antisense cloning entails the generation of an expression 
module which encodes an RNA complementary (antisense) to the RNA encoding 
GA4H (sense). By expressing the antisense RNA in a cell which expresses the 
sense strand, hybridization between the two RNA species will occur resulting in 
the blocking of translation. Alternatively, overexpression of a GA4H protein 
might be accomplished by use of appropriate promoters, enhancers, and other 
modifications. Those of skill in the art would be aware of references describing 
the use of antisense genes in plants (van der Krol et al, Gene 72:45-50 (1988); 
van der Krol et al, Plant Moi Biol. 74:467-486 (1990); Zhang et al, Plant Cell 
4:1575-1588 (1992)). 

Other foreign marker genes (i.e., exogenously introduced genes) typically 
used include selectable markers such as a neo gene (Potrykus et al, Mol Gen. 
Genet 199: 183-1 88 (1985)) which codes for kanamycin resistance; a bar gene 
which codes for bialaphos resistance; a mutant EPSP synthase gene (Hinchee et 
al., Bio/technology (5:915-922 (1988)) which encodes glyphosate resistance; a 
nitrilase gene which confers resistance to bromoxynil (Stalker et al, 1 Biol 
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Chem. 263:62 10-63 14 (1988)); a mutant acetolactate synthase gene (ALS) which 
confers imidazolinone or sulphonylurea resistance (EP application number 
154,204); a methotrexate resistant DHFR gene (Thillet et al, J. Biol Chem. 
263: 12500-12508) and screenable markers which include B-glucuronidase (GUS) 
or an R-locus gene, alone or in combination with a C-locus gene (Ludwig et al, 
Proc. Natl Acad.. Sci. USA 86:7092 (1989); Paz-Ares et al, EMBO J. d:3553 
(1987)). 

Alternatively, the genetic construct for expressing the desired protein can 
be microinjected directly into plant cells by use of micropipettes to mechanically 
transfer the recombinant DNA. The genetic material may also be transferred into 
plant cells using polyethylene glycol to form a precipitation complex with the 
genetic material that is taken up by cells. (Paszkowski et al , EMBO J. 3:27 1 7-22 

(1984) ). The desired gene may also be introduced into plant cells by electro- 
poration. (Fromm et al, "Expression of Genes Transferred into Monocot and 
Dicot Plant Cells by Electroporation," Proc. Natl Acad.. Sci. U.S.A. 52:5824 

(1985) ). In this technique, plant protoplasts are electroporated in the presence of 
plasmids containing the desired genetic construct. Electrical impulses of high 
field strength reversibly permeabilize biomembranes allowing the introduction 
of plasmids. Electroporated plant protoplasts reform cell walls, divide, and form 
plant calli. Selection of the transformed plant cells expressing the desired gene 
can be accomplished using phenotypic markers as described above. 

Another method of introducing the desired gene into plant cells is to infect 
the plant cells with Agrobacterium tumefaciens transformed with the desired 
gene. Under appropriate conditions well-known in the art, transformed plant cells 
are grown to form shoots, roots, and develop further into plants. The desired 
genetic sequences can be joined to the Ti plasmid of Agrobacterium tumefaciens. 
The Ti plasmid is transmitted to plant cells on infection by Agrobacterium 
tumefaciens and is stably integrated into the plant genome. Horsch et al, 
"Inheritance of Functional Foreign Genes in Plants," Science 233: 496-498 
(1984); Fraley etal 9 Proc. Nat'lAcad. Sci. U.S.A. 80: 4803 (1983)); Feldmann, 
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K.A. etalMol Gen. GeneL, 208: 1-9 (1987); Walden, R. et aL, Plant J., 7:281- 
288 (1991). 

Presently there are several different ways to transform plant cells with 
Agrobacteriwn: 

(1) co-cultivation of Agrobacterium with cultured, isolated 
protoplasts, or 

(2) transformation of cells or tissues with Agrobacterium. 
Method (1) requires an established culture system that allows culturing 
protoplasts and plant regeneration from cultured protoplasts. Method (2) requires 
that the plant cells or tissues can be transformed by Agrobacterium and that the 
transformed cells or tissues can be induced to regenerate into whole plants. In the 
binary system, to have infection, two plasmids are needed: a T-DNA containing 
plasmid and a vir plasmid. 

Routinely, however, one of the simplest methods of plant transformation 
is explant inoculation, which involves incubation of sectioned tissue with 
Agrobacterium containing the appropriate transformation vector (Plant Genetic 
Transformation and Gene Expression, A Laboratory Manual, Oxford: Blackwell 
Scientific Publications ( 1 988); Walden, Genetic Transformation in Plants, Milton 
Koynes: Open University Press (1988)). 

All plants from which protoplasts can be isolated and cultured to give 
whole regenerated plants can be used for the expression of the desired gene. 
Suitable plants include, for example, species from the genera Fragaria, Lotus, 
Medicago, Onobrychis, Trifolium, Trigonella, Vigna, Citrus, Linum, Geranium, 
Manicot, Daucus, Arabidopsis, Brassica, Raphanus, Sinapis, Atropa, Capsicum, 
Datura, Hyoscyamus, Lycopersion, Nicotiana, Solanum, Petunia, Digitalis, 
Majorana, Cichorium, Helianthus, Lactuca, Bromus, Asparagus, Antirrhinum, 
Hemerocallis, Nemesia, Pelargonium, Panicum, Pennisetum, Ranunculus, 
Senecio, Salpiglossis, Cucumis, Browallia, Glycine, Lolium, Zea, Triticum, 
Sorghum, and Datura. Additional plant genera that may be transformed by 
Agrobacterium include Ipomoea, Passiflora, Cyclamen, Malus, Prunus, Rosa, 
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Rubus, Populus, Santalum, Allium, Lilium, Narcissus, Ananas, Arachis, 
Phaseolus, and Pisum. 

Plant regeneration techniques are well known in the art and include those 
set forth in the Handbook of Plant Cell Culture, Volumes 1-3, Eds. Evans et al. 
Macmillan Publishing Co., New York, NY (1983, 1984, 1984, respectively); 
Predieri and Malavasi, Plant Cell, Tissue, and Organ Culture 1 7: 1 33-1 42 (1 989); 
James, D.J., et al.,J. Plant Physiol. 132: 148-1 54 (1988); Fasolo, F., et al, Plant 
Cell, Tissue, and Organ Culture 75:75-87 (1 989); Valobra and James, Plant Cell, 
Tissue, and Organ Culture 27:51 -54 (1 990); Srivastava, P.S., etal, Plant Science 
42:209-214 (1985); Rowland and Ogden, Hort. Science 27:1127-1129 (1992); 
Park and Son, Plant Cell, Tissue, and Organ Culture 75:95-105 (1988); Noh and 
Minocha, Plant Cell Reports 5:464-467 (1986); Brand and Lineberger, Plant 
Science 57: 1 73- 1 79 (1 988); Bozhkov, P.V., et al. , Plant Cell Reports 1 1 :386-389 
( 1 992); Kvaalen and von Arnold, Plant Cell, Tissue, and Organ Culture 2 7:49-57 
(1991); Tremblay and Tremblay, Plant Cell, Tissue, and Organ Culture 27:95- 
103 (1991); Gupta and Pullman, U.S. Patent No. 5,036,007; Michler and Bauer, 
Plant Science 77: 1 1 1-1 1 8 (1 991 ); Wetzstein, H.Y., et al. , Plant Science 64: 1 93- 
201 ( 1 989); McGranahan, G.H., etal. , Bio/Technology 5:800-804 (1988); Gingas, 
V.M., Hort. Science 2(5:1217-1218 (1991); Chalupa, V., Plant Cell Reports 
9:398-401 (1990); Gingas and Lineberger, Plant Cell, Tissue, and Organ Culture 
77:191-203 (1989); Bureno, M.A., et al, Phys. Plant. 55:30-34 (1992); and 
Roberts, D.R., etal, Can. J. Bot. 55:1086-1090 (1990). 

Plant regeneration from cultured protoplasts is described in Evans et al. , 
"Protoplast Isolation and Culture," in Handbook of Plant Cell Culture 7:124-1 76 
(MacMillan Publishing Co., New York, 1983); M.R. Davey, "Recent 
Developments in the Culture and Regeneration of Plant Protoplasts," Protoplasts, 
1983 - Lecture Proceedings, pp. 19-29 (Birkhauser, Basel, 1983); P.J. Dale, 
"Protoplast Culture and Plant Regeneration of Cereals and Other Recalcitrant 
Crops," in Protoplasts 1983 - Lecture Proceedings, pp. 3 1 -4 1 (Birkhauser, Basel, 
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1983); and H. Binding, "Regeneration of Plants," in Plant Protoplasts, pp. 21-37 
(CRC Press, Boca Raton, 1985). 

Techniques for the regeneration of plants varies from species to species 
but generally, a suspension of transformed protoplasts containing multiple copies 
of the desired gene is first provided. Embryo formation can then be induced from 
the protoplast suspensions, to the stage of ripening and germination as natural 
embryos. The culture media will generally contain various amino acids and 
hormones, such as auxins and cytokinins. It is also advantageous to add glutamic 
acid and proline to the medium, especially for such species as corn and alfalfa. 

Mature plants, grown from transformed plant cells, are selfed to produce 
an inbred plant. The inbred plant produces seed containing the recombinant DN A 
sequences promoting increased expression of GA4H. 

Parts obtained from regenerated plants, such as flowers, seeds, leaves, 
branches, fruit, and the like are covered by the invention provided that these pails 
comprise the herbicidal tolerant cells. Progeny and variants, and mutants of the 
regenerated plants are also included within the scope of this invention. As used 
herein, variant describes phenotypic changes that are stable and heritable, 
including heritable variation that is sexually transmitted to progeny of plants, 
provided that the variant still comprises a herbicidal tolerant plant through 
enhanced rate of acetylation. Also, as used herein, mutant describes variation as 
a result of environmental conditions, such as radiation, or as a result of genetic 
variation in which a trait is transmitted meiotically according to well-established 
laws of inheritance. 

Plants which contain the GA4H encoding DNA of the invention and no 
other foreign marker gene are advantageous in that removal of the foreign marker 
gene, once inserted into the plant, may be impossible without also removing the 
GA4H gene. Absence of the foreign marker gene is sometimes desired so as to 
minimize the number of foreign genes expressed. This can be achieved by 
providing the GA4H-encoding DNA between Ti-plasmid borders. 
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The GA4H gene product may have similar function(s) to 3-p-hydroxylase. 
3-P-hydroxylase is critical for controlling stem growth (Ingram et al, Plant 160: 
455-463 (1984). Accordingly, the GA4H of the invention may be applied to 
crops to enhance and facilitate such stem elongation, flowering and fruiting. 
5 Alternatively, the DN A encoding GA4H may be genetically inserted into the plant 

host to produce a similar effect. 

All plants which can be transformed are intended to be hosts included 
within the scope of the invention (preferably, dicotyledonous plants). Such plants 
include, for example, species from the genera Fragaria, Lotus, Medicago, 

10 Onobrychis, Trifolium, Trigonella, Vigna, Citrus, Linum, Geranium, Manihot, 

Daucus, Arabidopsis, Brassica, Raphanus, Sinapis, Atropa, Capsicum, Datura, 
Hyoscyamus, Lycopersicon, Nicotiana, Solanum, Petunia, Digitalis, Majorana, 
Cichorium, Helianthus, Lactuca, Bromus, Asparagus, Antirrhinum, Hererocallis, 
Nemesia, Pelargonium, Panicum, Pennisetum, Ranunculus, Sencia, Salpiglossis, 

15 Cucumis, Browalia, Glycine, Lolium, Zea, Triticum, Sorghum, Malus, Apium, 

Datura, the le mutant in peas, the ga4 mutant in Arabadopsis, and the dwarfl 
mutant in Monocotyledonous plants such as corn. 

Examples of commercially useful agricultural plants useful in the methods 
of the invention as transgenic hosts containing the GA4 DNA or antisense 

20 sequence of the invention include grains, legumes, vegetables and fruits, 

including but not limited to soybean, wheat, corn, barley, alfalfa, cotton, rapeseed, 
rice, tobacco, rye, tomatoes, beans, peas, celery, grapes, cabbage, oilseed, apples, 
strawberries, mulberries, potatoes, cranberries and lettuce. 

Having now generally described the invention, the same will be more 

25 readily understood through reference to the following examples which are 

provided by way of illustration, and are not intended to be limiting of the present 
invention, unless specified. 
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Examples 

Example 1 
Isolation of The GA4 Homologue Genes 

The presence of a GA 4-homologue gene (GA4H) was first determined by 
5 low stringency hybridization using a probe made from the GA4 sequence. The 

probe was designed based on the DNA sequence of a conserved amino acid 
region between GA4 and similar proteins (i.e. P-hydroxylases). 

Methods 

"Plant and Nucleic Acid Sources and Preparation 99 
10 A ga4-l (an ethyl methanesulfonate, EMS, induced mutant) mutant was 

obtained from M Koornneef (Agricultural University, Wageningen, The 

Netherlands). Plants were grown under a 1 6-hr light/ 8-hr dark cycle. For genomic 

DNA isolation, rossette leaves of 3-4 week old plants were harvested and frozen 

in liquid nitrogen. For RNA isolation, tissues from matured flowering plants of 

15 either ga4- 1 or Lansberg erecta were collected and immediately frozen in liquid 

nitrogen. 

pCD7 DNA containing the GA4 cDNA has been described previously 
(Chiang, H.H., et al. f Plant Cell 7:195-201 (1995)). The cloning vectors were 
either pBSKS(-) or pBSKS(+) of Stratagene (La Jolla, CA, U.S.A.). DNA 
20 markers, 1 Kb and 123 bp, are from Gibco BRL (Gaithersburg, MD, U.S.A.). 

Restriction and modifying enzymes were from New England Biolab (Cambridge, 
MA, U.S.A.). 

Genomic DNA of yeast strains carrying YAC DNA was isolated 
according to Ausubel, F.M., et al y Current Protocols in Molecular Biology, New 
25 York: Greene Publishing Association and Wiley-Interscience (1987). Plant 

genomic DNA was isolated by the method of Watson, J.C., and Thompson, W.F., 
Methods in Enzymology 118:57-75 (1986). RNA was isolated using the 
Tri-Reagent (Molecular Research Center, Cincinnati, OH, U.S.A.). 
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"Oligonucleotides and Sequence Analysis" 

Oligonucleotides were synthesized by the DNA Synthesis Core Facility 

of the Molecular Biology/ Endocrine Departments of Massachusetts General 

Hospital (MGH) (Boston, MA, U.SA.). In the following oligonucleotides the 

underlined nucleotides indicate the restriction recognition site shown in 

parenthesis. The name and sequence of the oligonucleotides are as follows: 

Homol: 5 ' -GTGGTT AGC ACTA AATTC AC-3 ' (SEQ ID No. 1 1) 

Homo2: 5'-GACCCATGGCTCGGTCCGGT-3' (SEQ ID No. 12) 

GA-P 1 X: 5 ' -GCJ£TAGAGAGTATTTGAGAAGG-3 ' (SEQ ID No. 13) 
(Xbal) 

GA-P2: 5 ' -GTTTACTATTGCCGATGACT-3 ' (SEQ ID No. 14) 

GA-P6: 5 ' -C AATACC AAAAATG AA AAGC-3 ' (SEQ ID No. 15) 

GA-P13: 5'-CTCCTACCGCAACCATTTC-3' (SEQ ID No. 16) 

GA-P 1 4S : 5 ' -TCCCCCGGGTTTATGTGATG AGC ATCCC-3 ' (SEQ ID 
No. 17) 

(Smal) 

GA-P15: 5 '-CCAAAGTAATTGTTTATGTG-3 ' (SEQ ID No. 18) 

GA-P16: 5'-AATTTAGGTTTTTCATTAAG-3' (SEQ ID No. 19 

GA-P17: 5'-GTAGTGGTTTAGTCGTATGG-3 ' (SEQ ID No. 20) 

GA-P18: 5 '-AAAACTTGGAGACCGGCGG-3 ' (SEQ ID No. 21) 

GA-P19: 5 ' -TATC ATGT AATCTTTTTGG-3 ' (SEQ ID No. 22) 

GA-P20: 5 '-CCGGCTTCCCGTACAGCGG-3 ' (SEQ ID No. 23) 

GA-P21 : 5 ' - AATC AAG AA ATTC AGTCGG-3 ' (SEQ ID No. 24) 

GA-P27E: 5 ' -GGAATTC AT ACC AA A AAC AT AA AGCC-3 ' (SEQ ID 
No. 25) (EcoRl) 

Tua4F: 5 '-CTAGTTTCTTTCTTCC ACG-3 ' (SEQ ID No. 26) 
Tua4R: 5 ' -TAGCTGC ATCTTCTTTACC-3 ' (SEQ ID No. 27) 

DNA sequences were determined by the DNA Sequencing Core Facility 
of the Department of Molecular Biology at Massachusetts General Hospital. 
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Sequence analyses were performed using the software package of the Genetics 
Computer Group (GCG; Madison, WI, U.S.A.). Blast searches were conducted 
through the National Center for Biotechnology Information (NCBI), (Bethesda, 
MD, U.S.A.) using the algorithm of Altschul, S.F., et al, 1 Mol Biol 
215:403-10 (1990). 

"Polymerase Chain Reaction" 

PCR was performed using the Peltier Thermal Cycler (PTC-200) of MJ 

Research (Watertown, MA, U.S.AO- A DNA fragment containing a conserved 

region on the second exon of the GA4 gene (Chiang, H.H., et al, Plant Cell 

7:195-201 (1995)) was generated by PCR using Homol and Homo2 primers. 

Probes prepared from this fragment (Homologous probes) were used for the 

genomic DNA gel blot and for screening the genomic library. The PCR reaction 

was carried out in 100 |il total volume and contained 0.4 ng of pCD7 DNA, 200 

|iM of dNTP, 15 \\M of each primer, and 2.5 units of Taq DNA polymerase 

(Boehringer Mannheim, Indianapolis, IN, U.S.A.). The PCR temperature profile 

was 35 cycles of 1 minute at 94°C, 1 minute at 50°C , and 3 minutes at 72°C. 

Preparation of the Unique probes were described earlier (Chiang, H.H., et al, 

Plant Cell 7:195-201 (1995)). 

In the mapping study, 1 |ig of each YAC DNA was used as templates for 

PCR amplification of the two homologous genes. The GA4H1 gene was 

amplified using GA-P2 and GA-P6 primers. The GA4H2 gene was amplified 

using GA-P19 and GA-P20 primers. Each PCR reaction was carried out in 25 |il 

total volume and contained 80 \\M of dNTPs, 10 \xM of each primer, and 2 units 

of Taq DNA polymerase (Boehringer Mannheim). The PCR was performed using 

35 cycles of 40 seconds at 92°C, 40 seconds at 55°C, and 40 seconds at 72°C. One 

fifth of the PCR product was separated on 0.8% agarose gel. 



"RT-PCR Conditions" 

First strand cDNA synthesis was performed according to Sambrook, J., 
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et al, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor: Cold 
Spring Harbor Laboratory (1989). The reaction volume was 25 \i\ and it 
contained 1 \ig of total RNA, 9 \iM of (dT) 20 , 1.2 mM dNTP, 136 units of 
RNASE inhibitor (Amersham, Arlington Heights, IL, U.S.A.), and 9.5 units of 
avian myeloblastosis virus (AMV) reverse transcriptase (Promega, Madison, WI, 
U.S.A.). The reaction was incubated at 42°C for one hour and then at 72°C for 1 5 
minutes. Eight microliters of the first strand cDNA was used as templates in the 
PCR amplification. The reaction was in 50 jil and used 63 |iM of dNTP, 0.6 (iM 
of homologous gene specific primer, 0.4 \xM of tubulin primer, and 2.5 units of 
Taq DNA polymerase (Boerhinger Mannheim). The thermal profile was 40 cycles 
of 45 seconds at 94°C, 45 seconds at 55°C, and 45 seconds at 72°C. When 
amplifying the full length cDNA, tubulin primers were not included and the 
extension time of 45 seconds at 72°C was increased to 1 .5 minutes. One-tenth of 
the PCR product was analyzed on an agarose gel. 

"Genomic Library Screening" 

An Arabidopsis genomic library made from ecotype C24 was kindly 
provided by Dr. Lin Sun (Nemapharm, Cambridge, MA, U.S.A.). This library 
was constructed using the SauiA partial digested genomic DNA and 
subsequently cloned into the Xhol site the ?JFIX-II vector (Stratagene). Screening 
of the library was performed according to the manufacturer's protocol 
(Stratagene). Plaques were transferred and crosslinked to Biotrans nylon 
membrane by autoclaving for 2 minutes. Homologous probes was prepared and 
the hybridization conditions were as described in Chiang, H.H., et ah, Plant Cell 
7: 1 95-201 (1995), except that Homol and Homo2 primers were used and filters 
were hybridized at 42°C (low stringency). Filters were washed once in 2X SSC 
( 1 X= 0. 1 5 M NaCl, 0.0 1 5 M sodium citrate) for 1 5 minutes at room temperature 
and twice in 0.1X SSC, 0.1% SDS for 30 minutes at 42°C (low stringency). 
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"DNA Gel Blot Analysis" 

In the genomic Southern, Arabidopsis (ecotype Lansberg erecta) genomic 
DN A was digested with appropriate restriction enzymes, separated by agarose gel 
electrophoresis, and transferred to Biotrans membrane (ICN Biomedical Inc., 
Aurora, OH USA) as described in Chiang, H.H., et a/., Plant Cell 7:195-201 
(1 995). For the homologous and unique GA4 gene probes, the hybridization and 
washing conditions were the same as the library screening above (low 
stringency). The DNA gel blot analysis using the GA4H1 gene, p3-l , probes was 
performed as described in Chiang, H.H., etal, Plant Cell 7:195-201 (1995). The 
hybridization and washing conditions were performed at 65°C (high stringency). 

DNA blot analyses for the mapping and RT-PCR products were 
performed as described (Cheng, C.L., et a/., Proc. Natl Acad.. ScL U.S.A. 
89: 1861-4(1 992)). In the mapping of homologous genes by PCR, probes specific 
to these genes were generated by PCR. Probes were prepared using a 4.4 kbp 
BgUl/ Xhol genomic DNA fragment, containing these two genes, as templates 
with four primers (GA-P2, GA-P6, GA-P19, and GA-P20). The reaction was in 
50 and it contained 5 ng of DNA template, 100 \itA each of dCTP, dGTP, and 
dTTP, 5 ^iM dATP, 50 ^Curies of a- 32 P dATP (Dupont NEN, Wilmington, DE, 
U.S.A.) 0.4 each primer, and 2.5 units of Taq DNA polymerase (Boerhinger 
Mannheim). The thermal profile was 30 cycles of 40 seconds at 94°C, 30 seconds 
at 55°C, and 30 seconds at 72°C. 

In the RT-PCR DNA gel blot, the same PCR method as above was used 
to prepare the GA4H1 and GA4H2 specific probes, except that different primers 
were employed. Primers pairs of GA-P13/ GA-P17 and GA-P18/ GA-P20 were 
used to prepare GA4H1 and GA4H2 gene probes, respectively. 

Results 

To isolate the DNA sequences with similar sequence to the GA4 gene 
(ATCC accession nos. 98393 and 98394), low stringency hybridization (see 
Materials and Methods) to Arabidopsis genomic DNA was performed with 
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Homologous probes (SEQ ID No. 2) prepared from a conserved region the GA4 
gene (Figure 1), compared to GA5 and other p-hydroxylases. Results from the 
blot of this genomic DNA, isolated from ecotype Lansberg erecta, are shown in 
Figure 2A. 

Beside a strong 3.2 kbp size band in the Hindlll digested DNA, a less 
intense 2. 1 kbp band is visible and assumed to contain DNA similar to the GA4 
gene (Figure 2A, lane 1). Similarly, there is a light 2.8 kbp band in the BamHI 
digested DNA (Figure 2A, lane 2). 

To identify the GA4 gene, a similar blot was hybridized at low stringency 
to a Unique probe (Figure 1 - SEQ ID No. 3) derived from a less conserved 
region of the GA4 gene. This probe would hybridize specifically to the GA4 gene, 
and results are shown in Figure 2B. 

In the Hindlll digestion, the GA4 specific probes hybridized strongly to 
the 3.2 kbp size band, and no detectable signal was found at the 2.1 kbp size 
(Figure 2B, lane 1). Similarly, the 2.8 kbp band in the BamHI digested DNA was 
not visible, indicating that the 2. 1 kbp Hindlll and the 2.8 kbp BamHI fragments 
contain a homologous sequence to the GA4 DNA (Figure 2B, lane 2). DNA 
digested with the EcoRI enzyme resulted in only high molecular weight bands 
being visible when either Homologous or Unique probes were used (Figure 2A 
and 2B, lane 3). 

The homologous probes were also used to screen a genomic library 
(ecotype C24) at low stringency conditions as described above. In addition to the 
GA4 genomic clones, one other genomic clone (A3) that contained the 2.1 kbp 
Hindlll fragment was isolated. This 2.1 kbp fragment of X3 was subcloned into 
pBSKS(-) to produce p3-l (Figure 3). The whole genomic insert in A3 was also 
cloned into pBSKS(+) using the Notl sites flanking the insert to generate 
pLVN103. To confirm this genomic clone, the p3-l DNA was used as a probe 
and hybridized at high stringency to the same genomic blot above. As shown in 
Figure 2C, both the 2.1 kbp Hindlll and 2.8 kbp BamHI fragments are present 
(lane 1 and 2). The predicted high molecular weight fragment in EcoRI digested 
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DNA is also present (lane 3). These results indicated that the predicted 
homologue of the GA4 gene had been isolated. 

Clones p3-l and part of pLVN103 DNAs were sequenced, and the 
homologue gene was named GA4H1 . Further sequencing in the 5* flanking of the 
GA4H1 gene revealed a second gene, named GA4H2, that also has sequence 
similarity to the GA4 as well as to the GA4H1 genes. The genome organization 
of these two linked genes is represented in Figure 3. When compared to the GA4 
gene, both the GA4H1 and GA4H2 genes also possess a single intron that is 
located at a similar position in the gene. Transcription of both genes is in the 
same direction, and they are separated by a 1 kbp spacer region (Figure 3). 

The plasmid designated pLVN103 comprising the genomic sequence of 
both the GA4H 1 and GA4H2 genes was deposited at the ATCC (Rockville, MD.) 
under the terms of the Budapest Treaty and has been granted accession number 
98436. 

Example 2 

Chromosomal Location of the Homologue Genes 

It was determined that both homologue genes are located on chromosome 
1. Since many continuous overlapping DNA clones of Yeast Artificial 
Chromosomes (YAC) containing Arabidopsis genomic DNA had been placed on 
the five linkage groups, the GA4H1 and GA4H2 genes can be mapped by 
anchoring them to YACs of known position. 

Probes derived from the genomic clone p3-l were hybridized to the CIC 
YAC library (Creusot, F., et aL, Plant Journal 5:763-70 (1995)), and three YAC 
clones (CIC1E4, CIC6C10 and CIC10A1 1) were isolated (data not shown).The 
intensity of the hybridization was higher in CIC1E4 and CIC6C10 than in 
CIC 1 OA 1 1 (data not shown). These putative YACs were subsequently confirmed 
by PGR amplification using primers specific to these two genes. 

Two specific primer sets (GA-P2/GA-P6 and GA-P19/GA-P20 for 
GA4H1 and GA4H2 genes, respectively) were used to amplify a short region in 
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these genes. The predicted amplified products for GA4H1 and GA4H2 genes are 
480 bp and 410 bp, respectively. The analysis of PCR products is shown in 
Figure 4A. For the GA4H2 gene, the predicted PCR product of 410 bp was 
present in both the control pLVN103 DNA (lane 2) and in two of the three 
putative YACs, CIC1E4 (lane 4) and CIC6C10 (lane 5). However, CIC10A1 1 
YAC did not appear to carry the GA4H2 gene, since the 4 1 0 bp size band was not 
present (lane 6). 

The CIC6C3 YAC, located on the bottom of chromosome 2, was used as 
a negative control. As expected, no PCR product was present in CIC6C3, 
indicating the specificity of these primers (lane 1). Similar results were also 
obtained for the GA4H1 gene where the predicted PCR product is 480 bp in size. 
The 480 bp size band was present in the pLVN103 control (lane 8) as well as in 
CIC1E4 and 6C10 (lanes 10 and 1 1). Again, the 480 bp size band was absent in 
CIC10A1 1. These results were further confirmed by the DNA gel blot. 

Probes, generated using the same 4 primers with the genomic clone 
(pLVNl 03), were hybridized to the DNA blot, and the results are shown in Figure 
4B. All predicted PCR products of 410 bp and 480 bp in size (for GA4H2 and 
GA4H1 genes, respectively) were hybridized to the probes. Since both CIC1E4 
and CIC6C10 were previously anchored to the bottom of chromosome 1, it was 
concluded that concluded that GA4H1 and GA4H2 genes are located at about 
159-cM (on the physical map) of chromosome 1 
(http://cbil.humgen.upenn.edu/~atgc/ATGCUP.html; 
http://cbil.humgen.upenn.edu/~atgc/physical-mapping/xlchl_pt4.html). 
CIC10A1 1 has overlapping regions to those two YACs above, and it hybridized 
weakly to probes prepared from p3-l . However, no PCR product was amplified 
when CIC10A1 1 was used as a template DNA. These results suggest that the 
edge of CIC10A1 1 DNA may end shortly after the HindiW site, located in the 3' 
flanking of the GA4H1 gene (see Figure 3). 
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Example 3 

Cloning ofGA4Hl and GA4H2 cDNAs By RT-PCR 

To determine whether the GA4H1 gene is expressed, probes derived from 
the clone p3-l containing most of the GA4H1 coding region were used to 
hybridize to RNA isolated from flowers, shoot meristems, leaves, roots and 
siliques. However, no visible signal was present in the RNA blot (data not 
shown). Another attempt to isolate the cDNA by screening a yeast expression 
library (Minet, M., et al. y Plant Journal 2AM All (1992)) using probes derived 
from p3-l also failed. Furthermore, searching the Arabidopsis EST database 
using the GA4H1 sequence no match was found to any known EST, indicating 
that the GA4H1 gene may be expressed at very low levels or only in a specific 
developmental stage of the plant. Therefore, isolation of the GA4H1 cDNA by 
reverse-transcriptase PCR (RT-PCR) was undertaken. 

The ga4 mutant was used as a source of RNA since the expression of the 
GA4 gene is under feedback regulation resulting in the induction of its mRNA 
(Chiang, H.H., et al., Plant Cell 7:195-201 (1995)). If the expression of the 
GA4H1 gene is regulated by the same or a similar mechanism, i.e. a higher level 
of GA4H 1 mRNA in the ga4 mutant than wild type, then one has a better chance 
of obtaining the cDNA in the ga4 mutant background. 

RT-PCR was performed using RNAs isolated from whole seedlings of 
ga4-l (EMS) and ga4-2 (T-DNA) mutants grown in liquid and from leaves and 
inflorescences of soil grown ga4-l plants. Inflorescences contain the shoot 
meristems, flowers and siliques. A predicted PCR product was observed only in 
RNA isolated from inflorescence tissues (data not shown). Therefore, 
inflorescences were used as a source of RNA for cloning the GA4H1 cDNA. 
Primers GA-P15 and GA-P16 were used in PCR following the reverse 
transcription. A nested PCR using GA-P1X and GA-P14S primers was 
performed, and the product was subsequently cloned into pBSKS(+) at the Smal 
and Xbal sites. 
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Since Tag DNA polymerase, a low fidelity enzyme, was used in the PCR 
amplification, three independent RT-PCR clones (pLVN107a, b, c) were 
sequenced. The consensus sequence of this cDNA clone, labeled as pLVN107, 
is shown in Figure 5 (SEQ ID No. 5). 

The cDNA contains 43 and 22 nucleotides in the 5' and 3' untranslated 
regions of the gene, respectively. Four of the nine nucleotides in the sequence 
surrounding the predicted start codon (ATG) are identical to the consensus 
sequence (Joshi, C. P., Nucleic Acids Res. 75:6643-53 (1987)). The intron occurs 
at a similar position relative to the GA4 gene. The GA4H1 genomic DNA 
sequence (SEQ ID No. 6), along with its deduced amino acid sequence (SEQ ID 
No. 7), are shown in Figure 6. The gene possesses a single 409 bp intron, and it 
follows the intron's GT/AG consensus rule. This gene encodes a protein of 355 
amino acids long. 

Comparison between the RT-PCR sequence (pLVN 1 07) and the genomic 
sequence (pL VN 1 03) revealed one nucleotide mismatch at the position no. 1 059 
of the cDNA sequence (Figure 5) (SEQ ID No. 5). The cDNA has a "G" at this 
position while the genomic DNA has an "A". This mismatch may arise from 
differences in the Lansberg erecta (L. er.) and C24 ecotypes from which cDNA 
and genomic sequences were derived, respectively. 

To resolve this, the genomic DNA of the L. er. ecotype was cloned by 
PCR amplification with a high fidelity enzyme, Pfit (Stratagene), using GA-P1X 
and GA-P14S primers. The sequence of this clone, pLVNl 10, is identical to the 
genomic clone in C24 ecotype, pLVN103 (data not shown). Therefore, the 
mismatch at this position could not be resolved by current data. 

Similar RT-PCR conditions were used to isolate the GA4H2 cDN A except 
that GA-P27E and GA-P21 primers (SEQ ID Nos. 1 1 and 14 respectively) were 
used, and the RNA source was of Lansberg erecta. One cDNA was cloned, 
pLVNl 15, and its sequence (SEQ ID No. 8) is shown in Figure 7. 

Similar to GA4 and GA4H1 gene, there is a single intron present at a 
conserved position in the gene. The sequence surrounding the predicted ATG 
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show 3/9 matches against the consensus sequence. The genomic sequence of the 
gene is shown in Figure 8 (SEQ ID No. 9). Sequence comparison between this 
cDNA and its genomic DNA shows a perfect match. The GA4H2 gene encodes 
a protein of 347 amino acids long (SEQ ID No. 10). 

Example 4 

Sequence Analysis of the GA4H1 and GA4H2 Proteins 

Each of the protein sequences (SEQ ID Nos. 7 and 1 0) of the GA4H1 and 
GA4H2 genes was searched against the protein Genbank database, and each time 
the GA4 protein was found to be the best match. This is not surprising, since 
these genes were isolated by hybridization to probes prepared from the GA4 
DNA sequence. 

The predicted proteins encoded by the GA4, GA4H1 , and GA4H2 genes 
were compared and the results of this comparison are shown in Figure 9. As 
expected, many conserved regions are present throughout these proteins. 
However, GA4H2 protein has higher homology to GA4 than does GA4H1. 
Amino acid sequence identity was calculated among these proteins using GAP 
software of the GCG package and the results are shown in Figure 1 0. GA4H2 and 
GA4 share 76% and 85% amino acid identity and similarity, respectively. 
Compared to this, GA4H1 and GA4 only share 57% and 73% amino acid identity 
and similarity, respectively. Results of comparison between GA4H1 and GA4H2 
are similar to those between GA4H1 and GA4. 

Several enzymes, including GA4 ((3-hydroxylase), in the gibberellin 
biosynthesis pathway belong to a group of non-heme iron-containing enzymes 
called 2-oxoacid-dependent dioxygenases (2-ODD). A binary comparison 
between these proteins and the three proteins described above is shown in Figure 
10. Proteins of different functions often share around 30% amino acid identity, 
while those from a multigene family in the same species show greater than 50% 
amino acid identity (Prescott, A.G., and John, ?.,Annu. Rev. Plant Physiol Plant 
Mol Biol 47:245-211 (1996)). Results in Figure 10 appear to support this 
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observation with the exception of the GA4, GA4H1 and GA4H2. These three 
proteins share greater than 50% amino acid identity, which indicates that they 
belong to the same family and/ or may have similar enzyme activities. 

Example 5 

Differential Expression of GA4H1 and GA4H2 Genes 

Since the expression of the GA4 gene is primarily in the silique, the 
expression levels of GA4H1 and GA4H2 genes in various organs was investigated 
to determine whether a similar expression pattern occurred. RT-PCR using 
Arabidopsis (Lansberg er) RNAs isolated from liquid grown roots, soil grown 
rosette leaves, floral shoots (including flowers), and siliques was performed. 
GA-P13/ GA-P17 and GA-P18/ GA-P20 primer pairs were used to amplify the 
GA4H1 and GA4H2 genes, respectively. Primers in each pair, located on 
separated exons were used to differentiate between cDNA and genomic DNA. 
The predicted RT-PCR products of GA4H1 and GA4H2 genes are 220 bp and 
440 bp, respectively. The predicted PCR products of GA4H1 and GA4H2 
genomic DNAs (containing the intron sequence) are 630 bp and 860 bp, 
respectively. 

Primers from the a-tubulin 4 gene, TUA4 (Kopczak, S.D., et a/., Plant 
Cell 4:539-47 (1992)), were used as an internal control along with GA4 
homologue gene specific primers. The a-tubulin primers generates a 320 bp 
RT-PCR product. Results of RT-PCR analysis are shown in Figure 1 1A. 

To confirm the PCR products, a DNA gel blot analysis was performed 
using probes derived from the GA4H1 gene (Figure 1 IB). The GA4H1 gene was 
mainly expressed in the flowers and shoot meristems, with smaller amounts in the 
siliques (Figures 1 1 A and 1 IB, lanes 5 and 8). In addition, GA4H1 gene was 
barely detected in the root tissues (Figures 1 1 A and 1 1 B, lane 7). However, there 
was no detectable level of GA4H1 gene in the rosette leaves (Figures 1 1 A and 
1 IB, lane 6). 



-44- 



Similar to the polymerase chain reaction control, the 630 bp product was 
present in pLVN103 containing the genomic clone (lane 3). There was a small 
amount of genomic DNA present in the RNA preparation, as indicated by the 
presence of the 630 bp size band in all tissue types. pCD7 (GA4 cDNA clone) 
and pLVN 1 1 5 (GA4H2 cDNA clone) were also used as templates to demonstrate 
the specificity of GA-P13 and GA-P1 7 primer pain 

Although some unspecific PCR products (Figure 1 1 A, lanes 1 and 2) were 
present, these primers amplified neither the GA4 nor the GA4H2 gene (Figure 1 1 
B, lanes 1 and 2). The internal RT-PCR control (ot-tubulin 4 gene) was present 
evenly in different tissue types with the exception of siliques (Figure 1 1 A, lane 
8). This may indicate that less silique RNA was used in this experiment, 
suggesting that the expression level of these genes in siliques was underestimated. 

A similar experiment was performed on the GA4H2 gene where GA-P1 8 
and GA-P20 primers were used in the amplification. Again, there was less silique 
RNA used, as indicated in Figure 12A (lanes 1-4). Unlike the GA4H1 gene, 
GA4H2 transcripts were more abundant in the root tissues, while lower levels 
were present in the flowers and shoot meristems (Figure 1 1 A and 1 IB, lanes 2 
and 4). In addition, GA4H2 expression is barely detected in siliques but not in 
leaves (Figure 1 1 A and 1 IB, lane 1 and 3). Again, the expression level of GA4H2 
gene in siliques was underestimated when compared to other tissues. A genomic 
DNA clone (pLVN103) was used as the control, and it possess the predicted 860 
bp size band (Figure 1 1A and 1 IB, lane 6). Similar to the GA4H1 RT-PCR result, 
primers used in this experiment were specific to the GA4H2 gene (Figure 1 1 A 
and 1 IB, lanes 7 and 8). 

Example 6 

Expression of Antisense GA4H RNA 

An expression vector is constructed using methods well known in the art, 
such that it expresses an RNA complementary to the sense strand GA4H RNA. 
The antisense GA4H RNA is expressed in a constitutive fashion using promoters 
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that are constitutively expressed in a given host plant, for example, the 
cauliflower mosaic virus 35S promoter. Alternatively, the antisense RNA is 
expressed in a tissue specific fashion using tissue specific promoters. As 
described earlier, such promoters are well known in the art. 

In one example, the antisense construct pP035 (Oeller et al, Science 
254:437-439 (1991)) is cut with BamHl and Sacl to remove the tACC2 cDNA 
sequence. After removing the tACC2 cDNA, the vector is treated with the 
Klenow fragment of E. coli DNA polymerase I to fill in the ends, and the 
sequence described in Figure 6 or 8 is blunt end ligated into the vector such that 
the strand operably linked to the promoter is that which transcribes the GA1 
antisense RNA sequence. The ligated vector is used to transform an appropriate 
E. coli strain. 

Colonies containing the ligated vector are screened using colony 
hybridization or Southern blotting to obtain vectors which contain the GA4H 
cDNA in the orientation which will produce antisense RNA when transcribed 
from the 35S promoter contained in the vector. 

The antisense GA4H vector is isolated from a colony identified as having 
the proper orientation and the DNA is introduced into plant cells by one of the 
techniques described earlier, for example, electroporation or AgrobacteriumlT\ 
plasmid mediated transformation. 

Plants regenerated from the transformed cells express antisense GA4H 
RNA. The expressed antisense GA4H RNA binds to sense strand GA4H RNA 
and thus prevents translation. 

In an initial experiment the phenotypes of transgenic plants epressing the 
antisense of the GA4H1 gene were examines. Constructs carrying the sense and 
antisense of the GA4H1 cDNA, under transcriptional control of the cauliflower 
mosaic virus 35S promoter, were transferred into Arabidopsis thalian ecotype 
Lansberg erecta via Agrobacterium ediated transformation (Bechtold et al , Acad 
Sci. Paris 376:1194-1199 (1993)). These constructs contained a neomycin 
phosphotranferase (NPT-II) gene whos proudct confers resistance to kanamycin. 
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Transgenic seed were harvested and subsequently germinated on MS medium 
supplemented with 50 mg/L kanamycin. Resistant seedlings (Tl generation) were 
transplanted to soil and the height was measured on mature plants. 
Untransformed plants, Lansberg erect a ecotype, were grown similarly but in the 
absence of kanamycin. 

Results of transgenic plants carrying the sense or antisense cDNA of the 
GA4H1 gene are shown in Figure 13. Overexpression of the GA4H1 cDNA in 
the sense orientation does not seem to alter the plant's height. However, several 
plants carrying the antisense of the GA4H1 cDNA exhibit dwarf phenotype. 
These preliminary results require further validation, especially in the subsequent 
generation. These results suggest that one can use the GA4H1 gene inthe 
antisense orientation to generate dwarf plants. 

EXAMPLE 7 
GA4H Protein Level in Wild-Type and 
Transgenic Lines 

Agrobacterium tumefaciens-mediated transformation of Arabidopsis root 
explants. 

The transformation procedure is described previously (Valvekens et aL, 
1 988) with slight modifications (Sun et al, Plant Cell 4: 1 1 9-1 28 (1 992)). Sense 
or anti-sense DNA is introduced into Agrobacterium LBA4404 by electroporation 
(Ausubel et al., Current Protocols in Molecular Biology (New York: Green 
Publishing Associates/Wiley-Interscience) (1990). Stability of the insert of the 
plasmid in LBA4404 is tested by restriction digestion and gel electrophoresis of 
plasmid DNA purified by NaOH/SDS minipreparation procedure (Ausubel et al., 
Current Protocols in Molecular Biology (New York: Green Publishing 
Associates/Wiley-Interscience) (1 990). 

A fresh overnight culture of LBA4404 carrying individual plasmids is 
used to infect root explants of four- week-old wild-type plants. Km r transgenic 
plants are regenerated as described (Valvekens et al , Proc. Natl Acad. Set USA 
55:5536-5540 (1988)). Seeds of transgenic plants are germinated on MS agar 



WO 98/59057 



PCT/US98/13044 



-47- 



plates containing kanamycin (50 (ig/ml). Non-germinating seeds after 8 days 
were transferred onto MS plates containing 1 00 uM GA 3 and 50 p.g/ml kanamycin 
to score for GA"7Km r and GA7Km s segregation. 

The levels of GA4H proteins in both sense and antisense transgenic 
5 Arabidopsis plants are compared to the level in wild-type plants (ecotype 

Landsberg erecta) by immunoblot analysis. Supernatant fractions, are obtained 
by tissue extraction and centrifugation (Bensen and Zeevaart, J. Plant Growth 
Regul. 9:23 7-242 (1990)). 

The expression of a gene in a plant is directed such that the gene has the 

10 same temporal and spatial expression pattern of GA4H. The gene is operably 

linked to the regulatory sequences of GA4H DNA to create an expression 
module, and a plant is then transformed with the expression module. One can 
examine the pattern of expression of the endogenous GA4H gene using a 
promoter-glucuronidase (GUS) gene fusion. The data from this analysis is used 

15 to design plant organ-specific promoters and cDNA gene fusions in order to 

manipulate the GA biosynthesis in specific plant organs. 



Immunoblot Analyses 

Proteins from 2-week-old Arabidopsis seedlings are extracted and 
fractionated by centrifugation at 1 0,000 g for 1 0 min and then at 1 00,000 g for 90 

20 min at 4°C (Bensen and Zeevaart, J. Plant Growth Regul. 9:237-242, 1 990). The 

100,000 g supernatant fractions (50 mg each) are loaded on an 8% SDS-PAGE 
gel, electrophoresed and transferred to a GeneScreen membrane (Du Pont-New 
England Nuclear). Immunoblot analysis is carried out as described (Sambrook 
et al. Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Press, Cold 

25 Spring Harbor, N.Y. 1989). The membrane is incubated with a GA4H antisera 

(primary antibody), then with 2500-fold diluted peroxidase-conjugated goat 
anti-rabbit antisera (secondary antibody, Sigma), and detected using the enhanced 
chemiluminescence reagent (ECL, Amersham) followed by autoradiography. 
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EXAMPLE8 

Over-Expression of GA4H Proteins in E. coli and the Procedure 
for Generating GA4H Antibodies 



Methods for heterologous expression of DNA clones in E. coli are known 
in the art (Chiang et al., Plant Cell 7: 195-201 (1995), Phillips et ai, 705:1049- 
1057 (1995), Wu et al t Plant Physiol 770:547-554 (1996), Yamaguchi et al. t 
Plant J. 70:203-213 (1996)). Plasmids containing DNA encoding a GA4H 
protein are transformed into DE3 lysogenic E. coli strain BL21(DE3) (Studier et 
a/., Methods Enzymol 7*5:60-89 (1990). The expression of the GA4H cDNA is 
induced by the addition of 0.4 mM isopropyl-fl-D-thiogalactopyranoside (IPTG) 
at absorbance (600 nm)=0.8 with 2 hour incubation at 37°C. Thirty ml of cell 
cultures are harvested by centrifugation, washed and resuspended in 10 ml of 
50 mM Tris (pH 8.0), 2 mM EDTA. The cells are sonicated on ice with a 
Branson microtip at a setting of 4, with four 20-sec pulses. The sonicate is 
mixed with 1 % Triton X-100, incubated on ice for 5 min and then centrifuged 
at 12000 g for 10 min at 4°C to isolate inclusion bodies (Marston, DNA 
Cloning: A Practical Approach, Oxford England: IRL Press, 1987, with slight 
modification). 

Alternatively, full-length cDNA clones may be expressed as fusion 
proteins similar to Phillips et al (Plant Physiol. 108:1049-1057, 1995) by using 
for example, an Invitrogen (San Diego, CA) Xpress Kit. 

The GA4H proteins are purified from the inclusion body fraction of £. 
coli extracts by SDS-polyacrylamide gel electrophoresis, and electrocution with 
the Electro-separation system (Schleicher & Schuell). Other methods routinely 
used by those of skill in the art protein purification can also be used. The purified 
proteins are detected as single bands on SDS-polyacrylamide gels by Coomassie 
Blue staining. Rabbit antibodies to GA4H proteins are obtained by subcutaneous 
injection of gel-purified proteins in complete Freund's adjuvant (Harlow and 
Lane, Antibodies: A Laboratory Manual, Cold Spring Harbor, NY, Cold Spring 
Harbor Laboratory, 1988). For N-group analysis, proteins are fractionated by 
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SDS-polyacrylamide gel electrophoresis and then transferred to Immobilon 
membrane (Millipore) in Tris-Glycine and 1 0% methanol. The membrane is first 
stained with Ponceau S, destained in deionized water and the appropriate protein 
bands excised for N-group analysis. 

The antibodies obtained should be useful for identifying cells or tissues 
expressing GA4H. A method to accomplish this objective comprises the steps of: 
a) incubating said cells or said tissues with an agent capable of binding to t h e 
GA4H protein or the RNA encoding GA4H; and b) detecting the presence of the 
bound agent. 

Example 9 

Modulating the Translation of RNA Encoding GA4H Protein 

The translation of RNA encoding GA4H protein in a plant is modulated 
by generating an expression vector encoding antisense GA4 HRNA. The plant 
is then transfected with the expression vector encoding the antisense GA4H RNA. 

Example 10 
Cloning DNA Encoding GA4H Protein 

A DNA molecule encoding the GA4H protein is cloned by hybridizing a 
desired DNA molecule to the sequences or antisense sequences of for example, 
DNA SEQ ID No. 5 or DNA SEQ ID No.6 under stringent hybridization 
conditions. Those DNA molecules hybridizing to the probe sequences are 
selected and transformed into a host cell. The transformants that express GA4H 
are selected and cloned. 

One possible set of hybridization conditions for the cloning of the DNA 
encoding GA4H protein is as follows: 

1) prehybridizing for 1 hour; 

2) hybridizing overnight at 65 °C in the hybridization buffer; and 

3) washing once for 1 5 minutes in 2xSSC at room temperature, then 
two times for 30 minutes in O.lxSSC and 0.1% SDS at 60 °C. 
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Example 11 
Stimulating Plant Stem Elongation 

Plant stem elongation is stimulated by inserting a DNA construct 
encoding the amino acid sequence of a GA4H protein into a transgenic plant. The 
transgenic plant is produced by any of several methods known in the art including 
those previously described in this specification. 

The stem elongation may be stimulated in Fragaria, Lotus, Medicago, 
Onobrychis, Trifolium, Trigonella, Vigna, Citrus, Linum, Geranium, Manihot, 
Daucus, Arabidopsis, Brassica, Raphanus, Sinapis, Atropa, Capsicum, Datura, 
Hyoscyamus, Lycopersicon, Nicotiana, Solanum, Petunia, Digitalis, Majorana, 
Cichorium, Helianthus, Lactuca, Bromus, Asparagus, Antirrhinum, Hererocallis, 
Nemesia, Pelargonium, Panicum, Pennisetum, Ranunculus, Sencia, Salpiglossis, 
Cucumis, Browalia, Glycine, Lolium, Zea, Triticum, Sorghum, Malus, Apium, and 
Datura. 

Example 12 
Producing Dwarf Plants 

Dwarf plants are produced by blocking the GA4H gene by homologous 
recombination, or by transforming with a GA4H anti-sense DNA in order to 
produce transgenic plants. A cDNA sequence can be used to construct the 
antisense construct which is then transformed into a plant by using an 
Agrobacterium vector (Zhang et al. 9 Plant Cell 4: 1 575- 1588 (Dec. 1 992)). Even 
partial antisense sequences can be used as antisense and can interfere with the 
cognate endogenous genes (van der Krol et al. 9 Plant Mol Biol 14: 457-466 
(1990)). The plant is transformed with the antisense construct according to the 
protocol of Valvekens etal, Proc. Natl Acad, Sci, USA 55:5536-5540 (1988). 

Dwarf plants are known to be commercially valuable. For example, dwarf 
trees for apples, cherries, peaches, pears and nectarines are commercially 
available (Burpee Gardens Catalogue 1994, pages 122-123). 
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Example 13 
Molecular Weight Markers 

The GA4H 1 and G A4H2 proteins produced recombinantly are purified by 
routine methods in the art (Current Protocol in Molecular Biology, Vol. 2, Chap. 
5 10, John Wiley & Sons, Publishers (1994)). Because the deduced amino acid 

sequence is known, the molecular weight of these proteins can be precisely 
determined, and the proteins can be used as molecular weight markers for gel 
electrophoresis. The calculated molecular weightsof the GA4H1 and GA4H2 
proteins based on the deduced amino acid sequences are 39086 daltons and 38740 
10 daltons respectively. 

Conclusions 

A genomic clone, comprising the sequences encoding the GA4H1 and 
GA4H2 proteins was obtained. The GA4H1 and GA4H2 proteins are 
homologues of the GA4 protein. It is believed that the GA4 locus encodes an 
15 hydroxylase involved in gibberellin biosynthesis. 

All references mentioned herein are fully incorporated by reference into 
the disclosure. 

Having now fully described the invention by way of illustration and 
example for purposes of clarity and understanding, it will be apparent to those of 
20 ordinary skill in the art that certain changes and modifications may be made in the 

disclosed embodiments, and such modifications are intended to be within the 
scope of the present invention. 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: THE GENERAL HOSPITAL CORPORATION 
FRUIT STREET 
BOSTON, MA 02114 
UNITED STATES OF AMERICA 

APPLICANT /INVENTOR: GOODMAN , HOWARD M. 

NGUYEN, LONG V. 
CHIANG, HUI-HWA 

(ii) TITLE OF INVENTION: GA4 HOMOLOGUE DNA, PROTEIN AND METHODS 
OF USE 

(iii) NUMBER OF SEQUENCES: 29 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: STERNE, KESSLER, GOLDSTEIN & FOX P.L.L.C. 

(B) STREET: 1100 NEW YORK AVE., SUITE 600 
<C) CITY: WASHINGTON 

(D) STATE: DC 

(E) COUNTRY: USA 

(F) ZIP: 20005 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.30 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: To be assigned 

(B) FILING DATE: Herewith 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: US 60/050,615 

(B) FILING DATE: 24-JUN-1997 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: CIMBALA, MICHELE A. 

(B) REGISTRATION NUMBER: 33,851 

(C) REFERENCE/ DOCKET NUMBER: 0609 . 439PC01/MAC/LBB 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (202)371-2600 

(B) TELEFAX : (202)371-254 0 



(2) INFORMATION FOR SEQ ID NO : 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1228 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: cDNA 
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(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 67.. 114 0 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 
ATAAGAAAAA AAACACAAAC ATCTATCAAA TTTACAAAGT TTTAAAACTA ATTAAAAAAG 60 

AGCAAG ATG CCT GCT ATG TTA ACA GAT GTG TTT AGA GGC CAT CCC ATT 108 
Met Pro Ala Met Leu Thr Asp Val Phe Arg Gly His Pro He 
1 5 io 

CAC CTC CCA CAC TCT CAC ATA CCT GAC TTC ACA TCT CTC CGG GAG CTC 156 
His Leu Pro His Ser His He Pro Asp Phe Thr Ser Leu Arg Glu Leu 
15 20 25 30 

CCG GAT TCT TAC AAG TGG ACC CCT AAA GAC GAT CTC CTC TTC TCC GCT 204 
Pro Asp Ser Tyr Lys Trp Thr Pro Lys Asp Asp Leu Leu Phe Ser Ala 
35 40 45 

GCT CCT TCT CCT CCG GCC ACC GGT GAA AAC ATC CCT CTC ATC GAC CTC 252 
Ala Pro Ser Pro Pro Ala Thr Gly Glu Asn He Pro Leu He Asp Leu 
50 55 60 



GAC CAC CCG GAC GCG ACT AAC CAA ATC GGT CAT GCA TGT AGA ACT TGG 
Asp His Pro Asp Ala Thr Asn Gin He Gly His Ala Cys Arg Thr Trp 
65 70 75 



300 



GGT GCC TTC CAA ATC TCA AAC CAC GGC GTG CCT TTG GGA CTT CTC CAA 34 8 

Gly Ala Phe Gin He Ser Asn His Gly Val Pro Leu Gly Leu Leu Gin 
80 85 90 

GAC ATT GAG TTT CTC ACC GGT AGT CTC TTC GGG CTA CCT GTC CAA CGC 396 
Asp He Glu Phe Leu Thr Gly Ser Leu Phe Gly Leu Pro Val Gin Arq 
95 100 105 no 

AAG CTT AAG TCT GCT CGG TCG GAG ACA GGT GTG TCC GGC TAC GGC GTC 4 44 

Lys Leu Lys Ser Ala Arg Ser Glu Thr Gly Val Ser Gly Tyr Gly Val 
115 120 125 

GCT CGT ATC GCA TCT TTC TTC AAT AAG CAA ATG TGG TCC GAA GGT TTC 4 92 

Ala Arg He Ala Ser Phe Phe Asn Lys Gin Met Trp Ser Glu Gly Phe 
130 135 140 

ACC ATC ACT GGC TCG CCT CTC AAC GAT TTC CGT AAA CTT TGG CCC CAA 54 0 

Thr He Thr Gly Ser Pro Leu Asn Asp Phe Arg Lys Leu Trp Pro Gin 
145 150 155 

CAT CAC CTC AAC TAC TGC GAT ATC GTT GAA GAG TAC GAG GAA CAT ATG 588 
His His Leu Asn Tyr Cys Asp He Val Glu Glu Tyr Glu Glu His Met 
160 165 170 

AAA AAG TTG GCA TCG AAA TTG ATG TGG TTA GCA CTA AAT TCA CTT GGG 636 
Lys Lys Leu Ala Ser Lys Leu Met Trp Leu Ala Leu Asn Ser Leu Gly 
175 180 185 190 

GTC AGC GAA GAA GAC ATT GAA TGG GCC AGT CTC AGT TCA GAT TTA AAC 684 
Val Ser Glu Glu Asp He Glu Trp Ala Ser Leu Ser Ser Asp Leu Asn 
195 200 205 



TGG GCC CAA GCT GCT CTC CAG CTA AAT CAC TAC CCG GTT TGT CCT GAA 



732 
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. Trp Ala Gin Ala Ala Leu Gin Leu Asn His Tyr Pro Val Cys Pro Glu 
210 215 " 220 

CCG GAC CGA GCC ATG GGT CTA GCA GCT CAT ACC GAC TCC ACC CTC CTA 780 
Pro Asp Arg Ala Met Gly Leu Ala Ala His Thr Asp Ser Thr Leu Leu 
225 230 235 

ACC ATT CTG TAC CAG AAC AAT ACC GCC GGT CTA CAA GTA TTT CGC GAT 828 
Thr He Leu Tyr Gin Asn Asn Thr Ala Gly Leu Gin Val Phe Arg Asp 
240 245 250 

GAT CTT GGT TGG GTC ACC GTG CCA CCG TTT CCT GGC TCG CTC GTG GTT 87 6 

Asp Leu Gly Trp Val Thr Val Pro Pro Phe Pro Gly Ser Leu Val Val 
255 260 265 " 270 

AAC GTT GGT GAC CTC TTC CAC ATC CTA TCC AAT GGA TTG TTT AAA AGC 924 
Asn Val Gly Asp Leu Phe His He Leu Ser Asn Gly Leu Phe Lys Ser 
275 280 285 

GTG TTG CAC CGC GCT CGG GTT AAC CAA ACC AGA GCC CGG TTA TCT GTA 972 
Val Leu His Arg Ala Arg Val Asn Gin Thr Arg Ala Arg Leu Ser Val 
290 295 300 

GCA TTC CTT TGG GGT CCG CAA TCT GAT ATC AAG ATA TCA CCT GTA CCG 1020 
Ala Phe Leu Trp Gly Pro Gin Ser Asp He Lys He Ser Pro Val Pro 
305 310 315 

AAG CTG GTT AGT CCC GTT GAA TCG CCT CTA TAC CAA TCG GTG ACA TGG 1068 
Lys Leu Val Ser Pro Val Glu Ser Pro Leu Tyr Gin Ser Val Thr Trp 
320 325 330 

AAA GAG TAT CTT CGA ACA AAA GCA ACT CAC TTC AAC AAA GCT CTT TCA 1116 
Lys Glu Tyr Leu Arg Thr Lys Ala Thr His Phe Asn Lys Ala Leu Ser 
335 340 345 350 

ATG ATT AGA AAT CAC AGA GAA GAA T GAT TAG AT A ATAATAGTTG TGATCTACTA 117 0 
Met He Arg Asn His Arg Glu Glu 
355 

GTTAGTTTGA TTAATAAATT GTTGTAAATG ATTTCAGCAA TATGATTTGT TTGTCCTC 1225 

(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 358 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met Pro Ala Met Leu Thr Asp Val Phe Arg Gly His Pro He His Leu 
15 10 15 

Pro His Ser His He Pro Asp Phe Thr Ser Leu Arg Glu Leu Pro Asp 
20 25 30 

Ser Tyr Lys Trp Thr Pro Lys Asp Asp Leu Leu Phe Ser Ala Ala Pro 
35 40 45 
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. Ser Pro Pro Ala Thr Gly Glu Asn He Pro Leu He Asp Leu Asp His 
50 55 60 

Pro Asp Ala Thr Asn Gin He Gly His Ala Cys Arg Thr Trp Gly Ala 
65 70 75 ^ 80 

Phe Gin He Ser Asn His Gly Val Pro Leu Gly Leu Leu Gin Asp He 
85 90 95 

Glu Phe Leu Thr Gly Ser Leu Phe Gly Leu Pro Val Gin Arg Lys Leu 
100 105 HO 

Lys Ser Ala Arg Ser Glu Thr Gly Val Ser Gly Tyr Gly Val Ala Arg 
115 120 125 

He Ala Ser Phe Phe Asn Lys Gin Met Trp Ser Glu Gly Phe Thr He 
130 135 140 

Thr Gly Ser Pro Leu Asn Asp Phe Arg Lys Leu Trp Pro Gin His His 
145 150 155 ^ 160 

Leu Asn Tyr Cys Asp He Val Glu Glu Tyr Glu Glu His Met Lys Lys 
165 170 175 

Leu Ala Ser Lys Leu Met Trp Leu Ala Leu Asn Ser Leu Gly Val Ser 
180 185 190 

Glu Glu Asp He Glu Trp Ala Ser Leu Ser Ser Asp Leu Asn Trp Ala 
195 200 205 

Gin Ala Ala Leu Gin Leu Asn His Tyr Pro Val Cys Pro Glu Pro Asp 
210 215 220 

Arg Ala Met Gly Leu Ala Ala His Thr Asp Ser Thr Leu Leu Thr He 
225 230 235 240 

Leu Tyr Gin Asn Asn Thr Ala Gly Leu Gin Val Phe Arg Asp Asp Leu 
245 250 " 255 

Gly Trp Val Thr Val Pro Pro Phe Pro Gly Ser Leu Val Val Asn Val 
260 265 270 

Gly Asp Leu Phe His He Leu Ser Asn Gly Leu Phe Lys Ser Val Leu 
275 280 285 

His Arg Ala Arg Val Asn Gin Thr Arg Ala Arg Leu Ser Val Ala Phe 
290 295 300 

Leu Trp Gly Pro Gin Ser Asp lie Lys He Ser Pro Val Pro Lys Leu 
305 310 315 320 

Val Ser Pro Val Glu Ser Pro Leu Tyr Gin Ser Val Thr Trp Lys Glu 
325 330 335 

Tyr Leu Arg Thr Lys Ala Thr His Phe Asn Lys Ala Leu Ser Met He 
340 345 350 

Arg Asn His Arg Glu Glu 
355 

(2) INFORMATION FOR SEQ ID NO: 3: 



WO 98/59057 



PCI7US98/13044 



-56- 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 159 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
CTTAAGTCTG CTCGGTCGGA GACAGGTGTG TCCGGCTACG GCGTCGCTCG TATCGCATCT 60 
TTCTTCAATA AGCAAATGTG GTCCGAAGGT TTCACCATCA CTGGCTCGCC TCTCAACGAT 120 
TTCCGTAAAC TTTGGCCCCA ACATCACCTC AACTACTGC 15 9 

(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 140 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
GTGGTTAGCA CTAAATTCAC TTGGGGTCAG CGAAGAAGAC ATTGAATGGG CCAGTCTCAG 60 
TTCAGATTTA AACTGGGCCC AAGCTGCTCT CCAGCTAAAT CACTACCCGG TTTGTCCTGA 120 
ACCGGACCGA GCCATGGGTC 14 0 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1133 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

GTTTATGTGA TGAGCATCCC ATTCTCTCAT TAGTTCACAA GTCATGCCTT CACTAGCAGA 60 

AGAGATATGT ATTGGTAACT TAGGCAGTCT CCAAACACTC CCCGAGTCGT TCACCTGGAA 120 

ACTCACAGCC GCCGACTCCC TTCTGCGTCC CTCCTCCGCC GTCTCATTCG ACGCAGTGGA 160 

AGAGTCCATT CCTGTGATCG ACCTCTCTAA TCCTGACGTT ACCACCCTCA TTGGAGATGC 24 0 
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CTCCAAAACA 


TGGGGAGCGT 


TTCAGATAGC 


CAACCACGGG 


ATTTCTCAGA 


AGCTTCTCGA 


300 


TGATATCGAG 


TCTCTGTCCA 


AAACCCTATT 


CGACATGCCG 


TCAGAGAGGA AGCTTGAAGC 


360 


GGCTTCCTCC 


GATAAAGGAG 


TTAGTGGCTA 


CGGAGAACCT 


CGAATCTCCC 


CCTTTTTCGA 


420 


GAAGAAAATG 


TGGTCTGAAG 


GGTTTACTAT 


TGCCGATGAC 


TCCTACCGCA 


ACCATTTCAA 


480 


TACTCTTTGG 


CCTCATGATC 


ACACCAAGTA 


CTGCGGTATA 


ATCCAAGAAT 


ACGTGGACGA 


540 


AATGGAAAAA 


TTAGCAAGCA 


GACTTCTGTA 


TTGCACATTA 


GGCTCACTTG 


GTGTCACCGT 


600 


GGAAGACATT 


GAATGGGCTC 


ACAAGCTAGA 


GAAATCTGGA 


TCAAAAGTGG 


GCAGAGGCGC 


660 


CATACGACTA 


AACCACTACC 


CGGTTTGTCC 


TGAACCAGAA 


CGAGCCATGG 


GTCTAGCCGC 


720 


T CAT AC AG AC 


TCCACTATCC 


TAACCATTCT 


GCACCAGAGC 


AACACGGGAG 


GGCTACAAGT 


780 


GTTCAGGGAA 


GAGTCCGGTT 


GGGTCACGGT 


TGAGCCGGCT 


CCTGGTGTCC 


TCGTGGTCAA 


840 


CATGGGTGAT 


CTCTTTCACA 


TCTTATCGAA 


CGGGAAAATC 


CCAAGCGTGG 


TTCATCGAGC 


900 


CAAAGTTAAC 


CATACTCGGT 


CAAGAATTTC 


GATTGCGTAC 


TTATGGGGTG 


GTCCAGCTGG 


960 


TGATGTGCAA 


ATCGCACCTA 


TCTCTAAGTT 


AACCGGTCCG 


GCTGAACCGT 


CTCTTTACCG 


1020 


GTCAATTACA 


TGGAAAGAGT 


ATCTCCAAAT 


AAAGTATGGG 


GTTTTCGACA 


AGGCCATGGA 


1080 


CGCAATTAGG 


GTCGTTAATC 


CCACCAATTA 


AATCTCCTTC 


TCAAATACTC 


TCT 


1133 



(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1610 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 86.. 556 

(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 966. . 1559 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

ACATATGTGT GTAGTATCTA TGCATATATA TCCAAAGTAA TTGTTTATGT GATGAGCATC 

CCATTCTCTC ATTAG.TTCAC AAGTC ATG CCT TCA CTA GCA GAA GAG ATA TGT 

Met Pro Ser Leu Ala Glu Glu He Cys 
1 5 

ATT GGT AAC TTA GGC AGT CTC CAA ACA CTC CCC GAG TCG TTC ACC TGG 
He Gly Asn Leu Gly Ser Leu Gin Thr Leu Pro Glu Ser Phe Thr Trp 
10 15 20 25 
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AAA CTC ACA GCC GCC GAC TCC CTT CTG CGT CCC TCC TCC GCC GTC TCA 208 

Lys Leu Thr Ala Ala Asp Ser Leu Leu Arg Pro Ser Ser Ala Val Ser 
30 35 40 

TTC GAC GCA GTG GAA GAG TCC ATT CCT GTG ATC GAC CTC TCT AAT CCT 256 
Phe Asp Ala Val Glu Glu Ser He Pro Val He Asp Leu Ser Asn Pro 
45 50 55 

GAC GTT ACC ACC CTC ATT GGA GAT GCC TCC AAA ACA TGG GGA GCG TTT 304 
Asp Val Thr Thr Leu He Gly Asp Ala Ser Lys Thr Trp Gly Ala Phe 
60 65 70 

CAG ATA GCC AAC CAC GGG ATT TCT CAG AAG CTT CTC GAT GAT ATC GAG 352 
Gin He Ala Asn His Gly He Ser Gin Lys Leu Leu Asp Asp He Glu 
75 80 85 



TCT CTG TCC AAA ACC CTA TTC GAC ATG CCG TCA GAG AGG AAG CTT GAA 4 00 

Ser Leu Ser Lys Thr Leu Phe Asp Met Pro Ser Glu Arg Lys Leu Glu 
90 95 100 ' " 105 

GCG GCT TCC TCC GAT AAA GGA GTT AGT GGC TAC GGA GAA CCT CGA ATC 4 48 

Ala Ala Ser Ser Asp Lys Gly Val Ser Gly Tyr Gly Glu Pro Arg He 
110 115 120 

TCC CCC TTT TTC GAG AAG AAA ATG TGG TCT GAA GGG TTT ACT ATT GCC 4 96 

Ser Pro Phe Phe Glu Lys Lys Met Trp Ser Glu Gly Phe Thr He Ala 
125 130 " 135 

GAT GAC TCC TAC CGC AAC CAT TTC AAT ACT CTT TGG CCT CAT GAT CAC 54 4 

Asp Asp Ser Tyr Arg Asn His Phe Asn Thr Leu Trp Pro His Asp His 
140 145 150 

ACC AAG TAC TGG TAACGTCTAT TACACACACA TATATATATT TTTTGCTTAT 5 96 

Thr Lys Tyr Trp 
155 



TTCGCAAAAG 


TGTGGCAAAG 


GAAATTGCAC 


ACTTTTTTTT 


TGCACTAAGA 


CTTAGTTATT 


656 


ATTAAAAGTG 


TTTAAATGTT 


TTTTTCTGTT 


CATAAAAAAG 


TGTTTATATG 


TTCCGAGTAA 


716 


TTGATGTTTA 


TGATTAGTGA 


TAACTGATAA 


CACATAGAGT 


GTAGCCTTCA 


AAGTTTCTAA 


776 


TTAAATAGTT 


T GAG C AAC AT 


CCTTATATTT 


TATGAAGTAG 


TACTTCTTAT 


TGCATATTAC 


836 


AGCAAATTAA 


AGTACCAAAG 


TCTCTATGAA 


ATGTGATAAT 


TTGGCTAATG 


TCGAGGTCTT 


896 


AACATTAGAT 


TACCAAAAAC 


CTT AAT TACT 


GTAAATTGTA 


TTTGCTTTTC 


ATTTTTGGTA 


956 


TTGTGCAGC GGT ATA ATC 


CAA GAA TAC GTG GAC GAA ATG GAA AAA TTA 


1004 



Gly He He Gin Glu Tyr Val Asp Glu Met Glu Lys Leu 
1 5 10 

GCA AGC AGA CTT CTG TAT TGC ACA TTA GGC TCA CTT GGT GTC ACC GTG 1052 
Ala Ser Arg Leu Leu Tyr Cys Thr Leu Gly Ser Leu Gly Val Thr Val 
15 20 25 

GAA GAC ATT GAA TGG GCT CAC AAG CTA GAG AAA TCT GGA TCA AAA GTG 1100 
Glu Asp He Glu Trp Ala His Lys Leu Glu Lys Ser Gly Ser Lys Val 
30 35 4 0 " 45 

GGC AGA GGC GCC ATA CGA CTA AAC CAC TAC CCG GTT TGT CCT GAA CCA 114 8 

Gly Arg Gly Ala He Arg Leu Asn His Tyr Pro Val Cys Pro Glu Pro 
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50 55 60 

GAA CGA GCC ATG GGT CTA GCC GCT CAT ACA GAC TCC ACT ATC CTA ACC 1196 
Glu Arg Ala Met Gly Leu Ala Ala His Thr Asp Ser Thr lie Leu Thr 
65 70 75 

ATT CTG CAC CAG AGC AAC ACG GGA GGG CTA CAA GTG TTC AGG GAA GAG 124 * 

lie Leu His Gin Ser Asn Thr Gly Gly Leu Gin Val Phe Arg Glu Glu 
80 85 90 

TCC GGT TGG GTC ACG GTT GAG CCG GCT CCT GGT GTC CTC GTG GTC AAC 1292 
Ser Gly Trp Val Thr Val Glu Pro Ala Pro Gly Val Leu Val Val Asn 
95 100 105 

ATG GGT GAT CTC TTT CAC ATC TTA TCG AAC GGG AAA ATC CCA AGC GTG 1340 
Met Gly Asp Leu Phe His He Leu Ser Asn Gly Lys He Pro Ser Val 
HO H5 120 125 

GTT CAT CGA GCC AAA GTT AAC CAT ACT CGG TCA AGA ATT TCG ATT GCG 1388 
Val His Arg Ala Lys Val Asn His Thr Arg Ser Arg He Ser lie Ala 
130 135 140 

TAC TTA TGG GGT GGT CCA GCT GGT GAT GTG CAA ATC GCA CCT ATC TCT 14 36 

Tyr Leu Trp Gly Gly Pro Ala Gly Asp Val Gin lie Ala Pro He Ser 
145 150 155 

AAG TTA ACC GGT CCG GCT GAA CCG TCT CTT TAC CGG TCA ATT ACA TGG 1484 
Lys Leu Thr Gly Pro Ala Glu Pro Ser Leu Tyr Arg Ser He Thr Trp 
160 165 * 170 

AAA GAG TAT CTC CAA ATA AAG TAT GAG GTT TTC GAC AAG GCC ATG GAC 1532 
Lys Glu Tyr Leu Gin He Lys Tyr Glu Val Phe Asp Lys Ala Met Asp 
175 180 185 

GCA ATT AGG GTC GTT AAT CCC ACC AAT TAAATCTCCT TCTCAAATAC 157 9 

Ala He Arg Val Val Asn Pro Thr Asn 
190 195 

TCTCTTAATG AAAAACCTAA ATTAAATGCG A 1610 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 157 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

Met Pro Ser Leu Ala Glu Glu He Cys He Gly Asn Leu Gly Ser Leu 
15 io 15 

Gin Thr Leu Pro Glu Ser Phe Thr Trp Lys Leu Thr Ala Ala Asp Ser 
20 25 30 

Leu Leu Arg Pro Ser Ser Ala Val Ser Phe Asp Ala Val Glu Glu Ser 
35 40 45 

He Pro Val He Asp Leu Ser Asn Pro Asp Val Thr Thr Leu He Gly 
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50 



55 



60 



Asp Ala Ser Lys Thr Trp Gly Ala Phe Gin He Ala Asn His Gly He 
65 70 75 80 

Ser Gin Lys Leu Leu Asp Asp He Glu Ser Leu Ser Lys Thr Leu Phe 
85 90 ' 95 

Asp Met Pro Ser Glu Arg Lys Leu Glu Ala Ala Ser Ser Asp Lys Gly 
100 105 HO 

Val Ser Gly Tyr Gly Glu Pro Arg He Ser Pro Phe Phe Glu Lys Lys 
115 120 125 

Met Trp Ser Glu Gly Phe Thr He Ala Asp Asp Ser Tyr Arg Asn His 
130 135 140 

Phe Asn Thr Leu Trp Pro His Asp His Thr Lys Tyr Trp 
145 150 155 



(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 198 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

Gly He He Gin Glu Tyr Val Asp Glu Met Glu Lys Leu Ala Ser Arg 
1 5 10 15 

Leu Leu Tyr Cys Thr Leu Gly Ser Leu Gly Val Thr Val Glu Asp He 
20 25 30 

Glu Trp Ala His Lys Leu Glu Lys Ser Gly Ser Lys Val Gly Arg Gly 
35 40 45 

Ala He Arg Leu Asn His Tyr Pro Val Cys Pro Glu Pro Glu Arg Ala 
50 55 60 

Met Gly Leu Ala Ala His Thr Asp Ser Thr lie Leu Thr He Leu His 
65 70 75 80 

Gin Ser Asn Thr Gly Gly Leu Gin Val Phe Arg Glu Glu Ser Gly Trp 
85 90 95 

Val Thr Val Glu Pro Ala Pro Gly Val Leu Val Val Asn Met Gly Asp 
100 105 HO 

Leu Phe His He Leu Ser Asn Gly Lys He Pro Ser Val Val His Arg 
115 120 125 

Ala Lys Val Asn His Thr Arg Ser Arg He Ser He Ala Tyr Leu Trp 
130 135 140 

Gly Gly Pro Ala Gly Asp Val Gin He Ala Pro He Ser Lys Leu Thr 
1^5 150 155 ~ 160 
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. Gly Pro Ala Glu Pro Ser Leu Tyr Arg Ser He Thr Trp Lys Glu Tyr 
165 170 175 

Leu Gin He Lys Tyr Glu Val Phe Asp Lys Ala Met Asp Ala He Arg 
180 185 190 

Val Val Asn Pro Thr Asn 
195 

(2) INFORMATION FOR SEQ ID NO: 9: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1105 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 



TCATACCAAA 


AACATAAAGC 


CAAAATATAA 


ACACATAAGC 


CTTTTAGCAT 


GAGTTCAACG 


60 


TTGAGCGATG 


TGTTTAGATC 


GCATCCCATT 


CACATCCCAC 


TCTCAAACCC 


ACCTGACTTC 


120 


AAATCTCTCC 


CGGATTCTTA 


CACGTGGACT 


CCTAAAGATG 


ATCTCCTCTT 


CTCCGCCTCC 


180 


GCCTCCGACG 


AAACCCTGCC 


GCTCATCGAC 


CTCTCCGATA 


TCCACGTGGC 


CACTCTTGTG 


240 


GGCCATGCTT 


GTACCACGTG 


GGGAGCGTTC 


CAGATCACCA 


ACCACGGCGT 


CCCCTCGCGA 


300 


CTTCTCGACG 


ACATTGAGTT 


CCTCACCGGA 


AGTCTTTTCC 


GGCTTCCCGT 


ACAGCGGAAG 


360 


CTCAAGGCGG 


CTCGGTCAGA 


GAATGGCGTC 


TCCGGCTACG 


GCGTAGCTCG 


TATTGCTTCG 


420 


TTCTTTAATA 


AGAAGATGTG 


GTCCGAAGGT 


TTCACCGTTA 


TTGGCTCTCC 


CCTCCACGAT 


480 


TTCCGTAAAC 


TCTGGCCCAG 


CCACCACCTC 


AAATACTGTG 


AAATTATTGA 


AGAGTATGAA 


54 0 


GAACATATGC 


AAAAGTTGGC 


AGCCAAGTTG 


ATGTGGTTCG 


CATTAGGTTC 


ACTGGGAGTT 


600 


GAAGAAAAGG 


ACATACAATG 


GGCCGGGCCT 


AATTCAGACT 


TTCAAGGAAC 


CCAAGCAGCT 


660 


ATCCAACTAA 


ACCATTATCC 


AAAATGTCCA 


GAACCAGACA 


GAGCCATGGG 


CCTCGCAGCC 


720 


CATACAGACT 


CGACCCTCAT 


GACCATTCTG 


TACCAGAACA 


ACACCGCCGG 


TCTCCAAGTT 


780 


TTCCGGGATG 


ACGTGGGCTG 


GGTTACCGCG 


CCACCTGTCC 


CTGGCTCGCT 


GGTGGTCAAC 


840 


GTCGGTGACT 


TGCTCCACAT 


TTTAACCAAC 


GGAATCTTCC 


CGAGCGTGCT 


TCACCGAGCC 


900 


AGGGTTAACC 


ACGTCCGATC 


TCGGTTCTCA 


ATGGCTTACC 


TGTGGGGTCC 


ACCATCCGAT 


960 


GTAATGATCT 


CTCCACTTCC 


CAAACTGGTT 


GATCCTCTCC 


AATCTCCTCT 


CTACCCATCT 


1020 


CTCACTTGGA 


AACAATACCT 


TGCTACCAAA 


GCTACTCATT 


TTAATCAATC 


TCTTTCCATT 


1080 


ATTAGAAATT 


AACTGTCTTC 


CGACT 








1105 
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(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1690 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 95.. 565 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 986.. 1555 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

TCACCGATCT ATAAATACAC TCCTCTTCTC CACCAAAAGT ATCATATCAT ACCAAAAACA 60 

TAAAGCCAAA ATATAAACAC ATAAGCCTTT TAGC ATG AGT TCA ACG TTG AGC 112 

Met Ser Ser Thr Leu Ser 
1 5 

GAT GTG TTT AGA TCG CAT CCC ATT CAC ATC CCA CTC TCA AAC CCA CCT 160 
Asp Val Phe Arg Ser His Pro He His He Pro Leu Ser Asn Pro Pro 
10 15 20 

GAC TTC AAA TCT CTC CCG GAT TCT TAC ACG TGG ACT CCT AAA GAT GAT 208 
Asp Phe Lys Ser Leu Pro Asp Ser Tyr Thr Trp Thr Pro Lys Asp Asp 
25 30" 35 

CTC CTC TTC TCC GCC TCC GCC TCC GAC GAA ACC CTG CCG CTC ATC GAC 256 
Leu Leu Phe Ser Ala Ser Ala Ser Asp Glu Thr Leu Pro Leu He Asp 
40 45 50 

CTC TCC GAT ATC CAC GTG GCC ACT CTT GTG GGC CAT GCT TGT ACC ACG 304 
Leu Ser Asp He His Val Ala Thr Leu Val Gly His Ala Cys Thr Thr 
55 60 65 ~ 70 

TGG GGA GCG TTC CAG ATC ACC AAC' CAC GGC GTC CCC TCG CGA CTT CTC 352 
Trp Gly Ala Phe Gin He Thr Asn His Gly Val Pro Ser Arg Leu Leu 
75 80 85 

GAC GAC ATT GAG TTC CTC ACC GGA AGT CTT TTC CGG CTT CCC GTA CAG 4 00 

Asp Asp He Glu Phe Leu Thr Gly Ser Leu Phe Arg Leu Pro Val Gin 
90 95 " 100 

CGG AAG CTC AAG GCG GCT CGG TCA GAG AAT GGC GTC TCC GGC TAC GGC 4 48 

Arg Lys Leu Lys Ala Ala Arg Ser Glu Asn Gly Val Ser Gly Tyr Gly 
105 110 " H5 

GTA GCT CGT ATT GCT TCG TTC TTT AAT AAG AAG ATG TGG TCC GAA GGT 4 96 

Val Ala Arg He Ala Ser Phe Phe Asn Lys Lys Met Trp Ser Glu Gly 
120 125 130 



TTC ACC GTT ATT GGC TCT CCC CTC CAC GAT TTC CGT AAA CTC TGG CCC 
Phe Thr Val He Gly Ser Pro Leu His Asp Phe Arg Lys Leu Trp Pro 



544 
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135 


140 


145 


150 




AGC CAC CAC CTC AAA TAC TGG TATCTTTTTC AATGGTTCAT TTTATCAACG 
Ser His His Leu Lys Tyr Trp 
155 


595 


TTAAGACCAT 


ATTAACGTAA 


CGTAACTTAT 


CTTTGTATGA 


AAAAAAAAAA AAAAACTGTG 


655 


GACGTTAGTA 


CAGTTGACTA 


TTCAATTGAT 


ATAGATTCGG 


GAATAATACG AAAAGGGTAA 


715 


AGTAGAAACC 


ATTTTTTGCC 


ATGTCGTAGT 


TAGTAAAAAG 


CACAATGAAA ACTCATGGAC 


775 


CCACCAAAAA 


GAT T AC AT G A 


TATAATATAT 


ATATATATAT 


TTATATAAAT ATTATATAAT 


835 


ATATTTATAT 


AATATTATGT 


GCAAAAATTA 


AATGAAAATA 


AATATTATCA GGAGAATGTG 


pa; 

o 3 D 


AAATACAGTA 


TAAGATTTTC 


CTTTGGCTAC 


ATGACGATTT 


CTATAGATTT GAAGGTTAAG 


955 


ATACTAATTT 


CAT AT TAT CG 


ATTCAACAGT 


GAA ATT ATT 
Glu lie He 
1 


GAA GAG TAT GAA GAA 
Glu Glu Tyr Glu Glu 
5 


10C 9 



CAT ATG CAA AAG TTG GCA GCC AAG TTG ATG TGG TTC GCA TTA GGT TCA 10=7 
His Met Gin Lys Leu Ala Ala Lys Leu Met Trp Phe Ala Leu Gly Ser 
10 15 20 

CTG GGA GTT GAA GAA AAG GAC ATA CAA TGG GCC GGG CCT AAT TCA GAC 1105 
Leu Gly Val Glu Glu Lys Asp He Gin Trp Ala Gly Pro Asn Ser Asp 
25 30 35 40 

TTT CAA GGA ACC CAA GCA GCT ATC CAA CTA AAC CAT TAT CCA AAA TGT 1153 
Phe Gin Gly Thr Gin Ala Ala He Gin Leu Asn His Tyr Pro Lys Cys 
4 5 50 ' 55 

CCA GAA CCA GAC AGA GCC ATG GGC CTC GCA GCC CAT ACA GAC TCG ACC 1201 
Pro Glu Pro Asp Arg Ala Met Gly Leu Ala Ala His Thr Asp Ser Thr 
60 65 70 

CTC ATG ACC ATT CTG TAC CAG AAC AAC ACC GCC GGT CTC CAA GTT TTC 12*9 
Leu Met Thr He Leu Tyr Gin Asn Asn Thr Ala Gly Leu Gin Val Phe 
75 80 85 

CGG GAT GAC GTG GGC TGG GTT ACC GCG CCA CCT GTC CCT GGC TCG CTG 1297 
Arg Asp Asp Val Gly Trp Val Thr Ala Pro Pro Val Pro Gly Ser Leu 
90 95 100 

GTG GTC AAC GTC GGT GAC TTG CTC CAC ATT TTA ACC AAC GGA ATC TTC 1345 
Val Val Asn Val Gly Asp Leu Leu His He Leu Thr Asn Gly He Phe 
105 U0 115 120 

CCG AGC GTG CTT CAC CGA GCC AGG GTT AAC CAC GTC CGA TCT CGG TTC 1393 
Pro Ser Val Leu His Arg Ala Arg Val Asn His Val Arg Ser Arg Phe 
125 130 135 

TCA ATG GCT TAC CTG TGG GGT CCA CCA TCC GAT GTA ATG ATC TCT CCA 14*1 
Ser Met Ala Tyr Leu Trp Gly Pro Pro Ser Asp Val Met He Ser Pro 
140 145 150 

CTT CCC AAA CTG GTT GAT CCT CTC CAA TCT CCT CTC TAC CCA TCT CTC 1469 
Leu Pro Lys Leu Val Asp Pro Leu Gin Ser Pro Leu Tyr Pro Ser Leu 
155 160 165 
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ACT TGG AAA CAA TAC CTT GCT ACC AAA GCT ACT CAT TTT AAT CAA TCT 1537 

Thr frp Lys Gin Tyr Leu Ala Thr Lys Ala Thr His Phe Asn Gin Ser 

170 175 180 

CTT TCC ATT ATT AGA AAT TAACTGTCTT CCGACTGAAT TTCTTGATTT 1585 
Leu Ser lie lie Arg Asn 
185 190 

TCAGATTTTA CTATTTATTT TCTTAGTAAT AT GAT GAT AT CTATTACTGT TTCGATTTTA 164 5 
GATGAGTGGT TCTTCAAATT C AC AAT TAG T AGCTTAATAT TGATT 1690 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 157 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

Met Ser Ser Thr Leu Ser Asp Val Phe Arg Ser His Pro He His He 
15 10 15 

Pro Leu Ser Asn Pro Pro Asp Phe Lys Ser Leu Pro Asp Ser Tyr Thr 
20 25 30 

Trp Thr Pro Lys Asp Asp Leu Leu Phe Ser Ala Ser Ala Ser Asp Glu 
35 40 45 

Thr Leu Pro Leu He Asp Leu Ser Asp He His Val Ala Thr Leu Val 
50 55 60 

Gly His Ala Cys Thr Thr Trp Gly Ala Phe Gin He Thr Asn His Gly 
65 70 75 80 

Val Pro Ser Arg Leu Leu Asp Asp He Glu Phe Leu Thr Gly Ser Leu 
85 90 95 

Phe Arg Leu Pro Val Gin Arg Lys Leu Lys Ala Ala Arg Ser Glu Asn 
100 105 HO 

Gly Val Ser Gly Tyr Gly Val Ala Arg He Ala Ser Phe Phe Asn Lys 
115 120 125 

Lys Met Trp. Ser Glu Gly Phe Thr Val He Gly Ser Pro Leu His Asp 
130 135 140 

Phe Arg Lys Leu Trp Pro Ser His His Leu Lys Tyr Trp 
145 150 155 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 190 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

Glu He He Glu Glu Tyr Glu Glu His Met Gin Lys Leu Ala Ala Lys 
1 5 10 15 

Leu Met Trp Phe Ala Leu Gly Ser Leu Gly Val Glu Glu Lys Asp He 
20 25 ' 30 

Gin Trp Ala Gly Pro Asn Ser Asp Phe Gin Gly Thr Gin Ala Ala He 
35 40 45 

Gin Leu Asn His Tyr Pro Lys Cys Pro Glu Pro Asp Arg Ala Met Gly 
50 55 60 

Leu Ala Ala His Thr Asp Ser Thr Leu Met Thr He Leu Tyr Gin Asn 
6 ^ 70 75 " 80 

Asn Thr Ala Gly Leu Gin Val Phe Arg Asp Asp Val Gly Trp Val Thr 
85 90 " 95 

Ala Pro Pro Val Pro Gly Ser Leu Val Val Asn Val Gly Asp Leu Leu 
100 105 HO 

His He Leu Thr Asn Gly He Phe Pro Ser Val Leu His Arg Ala Arg 
115 120 125 

Val Asn His Val Arg Ser Arg Phe Ser Met Ala Tyr Leu Trp Gly Pro 
130 135 140 

Pro Ser Asp Val Met He Ser Pro Leu Pro Lys Leu Val Asp Pro Leu 
14 * 150 155 * 160 

Gin Ser Pro Leu Tyr Pro Ser Leu Thr Trp Lys Gin Tyr Leu Ala Thr 
165 170 175 

Lys Ala Thr His Phe Asn Gin Ser Leu Ser He lie Arg Asn 
180 185 190 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
GTGGTTAGCA CTAAATTCAC 20 
(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
GACCCATGGC TCGGTCCGGT 
(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
GCTCTAGAGA GTATTTGAGA AGG 
(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
GTTTACTATT GCCGATGACT 
(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
CAATACCAAA AATGAAAAGC 
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(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18 
CTCCTACCGC AACCATTTC 
(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19 
TCCCCCGGGT TTATGTGATG AGCATCCC 
(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20 
CCAAAGTAAT TGTTTATGTG 
(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:21 
AATTTAGGTT TTTCATTAAG 
(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22 
GTAGTGGTTT AGTCGTATGG 
(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23 
AAAACTTGGA GACCGGCGG 
(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:24: 
TATCATGTAA TCTTTTTGG 
(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 
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(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:25 
CCGGCTTCCC GTACAGCGG 
(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26 
AATCAAGAAA TTCAGTCGG 
(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:27: 
GGAATTCATA CCAAAAACAT AAAGCC 
(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 
CTAGTTTCTT TCTTCCACG 
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(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:29: 
TAGCTGCATC TTCTTTACC 



19 
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INDICATIONS RELATING TO A DEPOSITED MICROORGANISM 

(PCT Rule \3bis) 



A. The indications made below relate to the microorganism referred to in the description on page 
line 25-26 . 



B. IDENTIFICATION OF DEPOSIT 



Further deposits are identified on an additional sheet □ 



Name of depositary institution 

American Type Culture Collection 



Address of depositary institution (including postal code and country) 

12301 Parklawn Drive Now at 

Rockville, Maryland 20852 
United States of America 



10801 University Boulevard 
Manassas. Virginia 201 10-2209 
United States of America 



Date of deposit 

May 20, 1997 



Accession Number 
ATCC 98436 



C. ADDITIONAL INDICATIONS (leave blank if not applicable) 



This information is continued on an additional sheet O 



Arabidopsis thaliana genomic DNA of GA4H1 and 
GA4H2 genes cloned into pBSKS(+) 
(Stratagene) vector pLVN103 in DH5cc 



D. DESIGNATED STATES FOR WHICH INDICATIONS ARE MADE a 



(if the indications are not for all designated States) 



E. SEPARATE FURNISHING OF INDICATIONS tew,/, ro ,^ 



The indications listed below will be submitted to the international Bureau later (specify the general nature of the indications, e.g.. 
"Accession Number of Deposit") 







8 This sheet was received with the international application 


□ This sheet was received by the International Bureau on: 


Authorized officer ^7^7/ 

paul f. mom 


Authorized officer 
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What Is Claimed Is: 

1 . A purified DNA molecule comprising a DNA sequence encoding 
the amino acid sequence of a GA4 homologue. 

5 2. The DNA molecule of claim 1 encoding the amino acid sequence 

of GA4H1 in Figure 6 (SEQ. ID. No. 7). 

3. The DNA molecule of claim 1 encoding the amino acid sequence 
of GA4H2 in Figure 8 (SEQ. ID. No. 10). 

4. The DNA molecule of claim 1 , wherein said DNA is selected from 
10 the group consisting of the genomic DNA's, SEQ ID No. 6 in Figure 6, SEQ ID 

No. 9 in Figure 8 5 cDNAs having SEQ ID No. 5 in Figure 5, SEQ. ID. No. 8 in 
Figure 7 and a degenerate variant of any of said sequences. 

5. A DNA molecule comprising a sequence with at least 95% 
homology to the DNA sequence in any one of claims 1-4. 

15 6. A vector comprising the sequence of claim 5. 

7. A host transformed with the vector of claim 6. 

8. The host of claim 7, wherein said host is selected from the group 
consisting of bacteria, yeast, plants, insects or mammals. 

9. The host of claim 8, wherein said host is a plant cell. 

20 10. The host of claim 9, wherein said plant cell is a dicotyledonous 

plant cell. 
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11. A plant regenerated from the plant cell of claim 1 0. 

1 2. Progeny of the plant of claim 1 1 . 

13. A propagule of the plant of claim 1 1 . 

14. A seed produced by the progeny of claim 11. 

1 5 . Purified GA4H protein. 

16. The protein of claim 15, wherein said GA4H protein is an 
Arabidopsis protein. 

17. The GA4H protein of claim 15, wherein said GA4H protein is 
selected from the group consisting of GA4H1 comprising the amino acid 
sequence shown in Figure 6 (SEQ. ID NO. 7), GA4H2 comprising the amino acid 
sequence shown in Figure 8 (SEQ. ID NO. 10) and a functional derivative of said 
sequences. 

18. The GA4H protein of any one of claims 15-17, wherein said 
GA4H protein is substantially free of other A thaliana proteins. 

1 9. A cell extract comprising a GA4H protein. 

20. The cell extract of claim 21, wherein said GA4H protein is an 
Arabidopsis protein. 

21. The cell extract of claim 26, wherein said GA4H protein 
comprises the amino acid sequence selected from the group consisting of Figure 
6 (SEQ. ID NO. 7) , Figure 8 (SEQ. ID. NO. 10) and a functional derivative of 
said sequences. 



WO 98/59057 



PCTVUS98/D044 



-73- 

22. The cell extract of any one of claims 19-21 wherein said cell is a 
prokaryotic cell or a eukaryotic cell. 

23. The cell extract of claim 22, wherein said prokaryotic cell is an E. 

5 coli. 

24. The cell extract of claim 22, wherein said eukaryotic cell is a yeast, 
fungal, insect, mammalian or transgenic plant cell. 

25. A cell extract comprising A thaliana GA4H protein, wherein said 
cell is not A thaliana. 

10 26. A method of making GA4H protein wherein said GA4H protein 

is substantially free of other A. thaliana proteins, said method comprising: 

a) transforming a prokaryotic or eukaryotic cell with a GA4H 
recombinant expression vector encoding a GA4H protein or 
a functional derivative of a GA4H protein, 

15 b) expressing said GA4H protein, and 

c) isolating said GA4H protein substantially free of other A. 
thaliana proteins. 

27. The method of claim 26 wherein said GA4H protein is isolated 
from E. coli inclusion bodies. 

20 28 . A method of directing the expression of a gene in a plant, such that 

said gene has the same temporal and spatial expression pattern of a GA4H, said 
method comprising the steps of: 

1) operably linking said gene to the regulatory sequences of 
GA4H to create an expression module, and 
25 2) transforming said plant with said expression module of 

part(l). 
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29. A method of modulating the translation of RNA encoding GA4H 
in a plant comprising the steps of: 

1 ) generating an expression vector encoding antisense GA4H 
5 RNA; 

2) transfecting said plant with said expression vector of part 
(1). 

30. An isolated DNA construct wherein said construct consists 
essentially of a nucleic acid sequence, and wherein said nucleic acid sequence: 

10 1) encodes GA4H polypeptide, and 

2) hybridizes to the sense or antisense sequence of the GA4H 
DNA when hybridization is performed under stringent 
hybridization conditions. 

31. An isolated DNA molecule encoding a GA4H protein, said DNA 
15 molecule prepared by a process comprising: 

1) hybridizing a desired DNA molecule to the sense or 
antisense sequence of a GA4H DNA sequence, wherein 
the hybridization is performed under stringent 
hybridization conditions; 
20 2) selecting those DNA molecules of said population that 

hybridize to said sequence; and 

3) selecting DNA molecules of part (2) that encode 
said GA4H protein. 

32. An isolated DNA molecule encoding a GA4H protein as claimed 
25 in claims 30 or 3 1 , said DNA molecule prepared by a process comprising: 

1) prehybridizing for 1 hour; 

2) hybridizing overnight at 65 ° C in the hybridization buffer; 
and 
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3) washing once for 15 minutes in 2xSSC at room 
temperature, then two times for 30 minutes in O.lxSSC 
and 0.1%SDSat60°C. 

33. A method of cloning a DNA molecule that encodes aGA4H 
protein, said method comprising: 

1) hybridizing a desired DNA molecule to the sense or 
antisense sequence of GA4H DNA wherein the 
hybridization is performed under stringent hybridization 
conditions; 

2) selecting those DNA molecules of said population that 
hybridize to said sequence; 

3) transforming said DNA of part (2) into a host cell; and 

4) selecting transformants that express said GA4. 

34. The method of claim 33 wherein the hybridization conditions 
consist essentially of: 

1) prehybridizing for 1 hour; 

2) hybridizing overnight at 65 °C in the hybridization buffer; 
and 

3) washing once for 15 minutes in 2xSSC at room 
temperature, then two times for 30 minutes in O.lxSSC 
and 0.1%SDSat60°C. 

35. A method of altering stem elongation, said method comprising 
inserting a DNA construct encoding the amino acid sequence of a GA4H protein 
into a transgenic plant. 

36. A method of producing a transgenic dwarf plant said method 
comprising transforming a plant with the antisense or sense construct of a GA4H 
gene or cDNA. 
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37. A method of making GA4H protein wherein said GA4H protein 
is substantially free of other A thaliana proteins, comprising: 

a) transforming a prokaryotic or eukaryotic cell with a GA4H 
recombinant expression vector encoding a GA4H protein, 

b) expressing said GA4H protein, and 

c) purifying said GA4H protein substantially free of other A. 
thaliana proteins. 

38. An antibody or fragment thereof, capable of binding a GA4H 

protein. 

39. A method of identifying cells or tissues expressing GA4H 
comprising the steps of: 

a) incubating said cells or said tissues with an agent capable of binding to 
the GA4H protein or the RNA encoding GA4H; and 

b) detecting the presence of the bound agent. 

40. The method of claim 39 wherein said agent capable of binding to 
the GA4H protein is an antibody or fragment thereof. 
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1 ATAAGAAAAAAAACACAAACATCTATCAAATTTACAAAGTTTTAAAACTAATTAAAAAAG 60 

61 AGCAAG ATGCCTGCTATGT T AACAGATGTG T T T AGAGGCCATCCC AT TCACC TCCCACAC 120 

I MPAMLTDVFRGHPIHLPH 18 

121 TCTCACATACCTGACTTCACATCTCTCCGGGAGCTCCCGGAT TCT TACAAGTGGACCCCT 180 

19 SH IPDFTSLRELPDSYKWTP 38 

181 AAAG ACGATCTCCTC T TC TCCGCTGC TCCTTC TCCTCCGGCC ACCGG TGAAAAC ATCCCT 240 

39KD0LLFSAAPSPPATGENIP 58 

24 1 CTCATCGACCTCGACCACCCGGACGCGACTAACCAAATCGGTCATGCATGTAGAACTTGG 300 

59 L IOLDHPDATNOIGHACRTW 78 

301 GG TGCCT TCCAAATCTC AAACCACGGCGTGCC T T TGGGAC T TC TCCAAGACAT TG ACT T T 360 

79GAF Q I SNHGVPLGL L 00 I EF 98 

36 1 C TC ACCGG TAG TCTCT TCGGGCT ACCTGTCCAAGGCAAGC T T AAG T C TGCTCGGTCGGAG 420 

99 L T G S L F G L P V 0 R K L K S A R S E 118 

42 1 ACAGGTGTGTCOGGCTACGGCGTCGCTCGT ATCGCATC T T TCT TCAAT AAGCAAATGTGG 480 

119 T G V S G Y G V A R 1 A $ F F N K Q M W 138 

481 TCCGAAGGTTTCACCATCACTGGCTCGCCTCTCAACGAT T TCCGT AAACTTTGGCCCCAA 540 

139 SECFT 1 TGSPLNDFRKLWPQ 158 

54 1 CATCACCTCAACTACTGCGATATCGTTGAAGAGTACGAGGAACATATGAAAAAGTTGGCA 600 

159 H H L N Y C D I VEEYEEHMKKLA 178 

601 TCCAMTTCAT j^AGCAUAMnMCTTC^^ 660 

179 S K L MWLALNSLGVSEEDIEW 198 

661 toGTCTCAGTTCAGA I 1 I AAACTGGGCCCAAGC iCCTCTCCAGCTAMTCATTACC^ 720 

199 A S L S S D L N W A Q A A L 0 L N H Y P 218 

721 IG 1 1 1GTCCTGAACCGG ACCGAGCCATGGGTQ TAGCAnnTr.ATArrnAPTrPAPPrTrrTA 780 

219 V C P E P D R A M G L A A H T D S T L L 238 

78 1 ACCATTCTGTACCAGAACAATACCGCCGGTCTACAAGTATTTCGCGATGATCTTGGTTGG 840 

239 T 1 LYQNNTAGLQVFRDDLGW 258 

841 GTCACCGTGCCACCGTTTCCTGGCTCGCTCGTGGTTAACGTTGGTGACCTCTTCCACATC 900 

259 V T V P P F P G S L V V N V G 0 L F H I 278 

901 C T ATCC AATGG A T TG T T T AAAAGCG TG T TGC ACCGCGC TCGGG T T AACCAAACCAG AGCC 960 

279 L S N G L F K S V L H R A R V N Q T R A 298 

. FIG.1A 
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961 CGGTTATCTGTAGCATTCCTTTGGGGTCCGCAATCTGATATCAAGATATCACCTGTACCG 1020 

299 RLSVAFLWGPQSDIK ISPVP 318 

1021 AAGCTGG T TAGTCCCGT TGAATCXJCCTCTATACCAATCGG TG ACATGGAAAGAGTATCT T 1080 

319 K L V S P V E S P L Y Q S V T W K E Y L 338 

1081 CGAACAAAAGCAAC TCACT TCAACAAAGCTCT TTCAATG AT T AGAAATCACAGAGAAGM 1 140 

339 RTKATHFNKAL S M I RNHREE 358 

1 141 TG AT T AGAT AAT AAT AG T TG TG ATCTACT AG T TAG T T TG AT T AA T AAAT TG T TG T AAATG 1200 

1201 ATTTCAGCAATATGATTTGTTTGTCCTC ' * 12 28 



FIG.1B 
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1 gltlolglgo Igogcotccc otlclclcol logltcocoo gtcATGCCTT 

51 CACTAGCAGA AGAGATATGT ATTGGTAACT TAGGCAGTCT CCAAACACTC 

101 CCCGAGTCGT TCACCTGGAA ACTCACAGCC GCCGACTCCC TTCTGCGTCC 

15! CTCCTCCGCC GTCTCATTCG ACGCAGTGGA AGAGTCCATT CCTGTGATCG 

201 ACCTCTCTAA TCCTGACGTT ACCACCCTCA TTGGAGATGC CTCCAAAACA 

251 TGGGGAGCGT TTCAGA1AGC CAACCACGGG ATTTCTCAGA AGCTTCTCGA 

301 TGATATCGAG TCTCTGTCCA AAACCCTATT CGACATGCCG TCAGAGAGGA 

351 AGCTTGAAGC GGCTTCCTCC GAJAAAGGAG TTAGTGGCTA CGGAGAACCT 

401 CGAATCTCCC CCTTTTTCGA GAAGAAAATG TGGTCTGAAG GGTTTACTAT 

451 TGCCGATGAC TCCTACCGCA ACCATTTCAA TACTCTTTGG CCTCATGATC 
f 

501 ACACCAAGTA CTGCGGTATA ATCCAAGAAT ACGTGGACGA AATGGAAAAA 

551 TTAGCAAGCA GACTTCTGTA TTGCACATTA GGCTCACTTG GTGTCACCGT 

601 GGAAGACATT GAATGGGCTC ACAAGCTAGA GAAATCTGGA TCAAAAGTGG 

651 GCAGAGGCGC CATACGACTA AACCACTACC CGGTTTGTCC TGAACCAGAA 

701 CGAGCCATGG GTCTAGCCGC TCATACAGAC TCCACTATCC TAACCATTCT 

751 GCACCAGAGC AACACGGGAG GGCTACAAGT GTTCAGGGAA GAGTCCGGTT 

801 GGGTCACGGT TGAGCOGGCT CCTGGTGTCC TCGTGGTCAA CATGGGTGAT 

851 CTCTTTCACA TCTTATGGAA CGGGAAAATC CCAAGCGTGG TTCATCGAGC 

901 CAAAGTTAAC CATACTGGGT CAAGAATTTC GATTGCGTAC TTATGGGGTG 

951 GTCCAGCTGG TGATCTGCAA ATCGCACCTA TCTCTAAGTT AACCGGTCCG 

1001 GCTGAACCGT CTCTTTACCG GTCAATTACA TGGAAAGAGT ATCTCCAAAT 

1051 AAAGTATGGG GTTTTCGACA AGGCCATGGA CGCAATTAGG GTCGTTAATC 

1101 CCACCAATlo gotctccttc tcoooloctc lei 

FIG.5 
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GA-P15 GA-P14S 

^catotgtgtgtogtotctatgcototototccooogtaaUgttlotglgotgogcalc 60 

ccottctctcatlogttcacoagt c ATGCCT TCACT AGCAGAAGAGAT ATG TATTGGTAA 120 

MPSLAEEICIGN 12 

C T T AGGCAG TC TCCAAACAC TCCCCGAG TCGT TCACCTGG AAACTCACAGCCGCCGACTC 180 

LGSLQTLPESFTWKLTAADS 32 

CCT TCTGCGTCCCTCCTCCGCCG TCTCAT TCGACGCAG TGG A AG AG TCC AT TCC TG TG A T 240 

L L R P S S A V S F 0 A V E E S I P V I 52 

CGACCTCTCTAATCCTGACGTTACCACCCTCATTGGAGATGCCTCCAAAACATGGGGAGC 300 

DLSNPDVTTL IGDASKTWGA 72 

G T TTCAGATAGCCAACCACGGGAT T TC TCAG A AGC T TC TCG ATG AT ATCG AG TC TC TG TC 360 

FQIANHGISQKLLDDIESLS 92 

CAAAACCCTATTCGACATGCCGTCAGAGAGGAAGCTTGAAGCGGCTTCCTCCGATAAAGG 420 

KTLFDMPSERKLEAASSOKG 112 

AGTTAGTGGCTACGGAGAACCTCGAATCTCCCCCTTTTTCGAGAAGAAAATGTGGTCTGA 480 

VSGYGEPRISPFFEKKMWSE 132 

GA-P2 GA-P13 

AGGG T T TACT A T TGCCG ATG AC TCCT ACCGCAACCAT T TCAAT AC TC T T TGGCC TCATGA 540 

GF T I ADDSYRNHFNTLWPHO 152 

TCACACCAAGTACTGgtaacgtctoUocacacocotatototQUUllgctlaUtcg 600 

H T K Y W 157 

coooogtgtggcoooggoooUgcocacUtmutgcoctoogocitogUoUatto 660 

ooaglgtUaaotgttUtUclgUcotaoaooogtgmatatgttccgogtaoUga 720 

tgtltotgoltaglgotaoctgotaococQtogoglgtogccUcooagtttcloottoo 780 

ologUtgogcoocalccttoloUtlotgoogtagtocltcttattgcotoUocogco 840 

ooUooogtoccoooglctclotgaoolgtgotootUggctootgtcgoggtcttooco 900 

tlogottoccaoooocclloolloclglooottgtoUtgcttttcotUUggtoltgt 960 

gc o gCGGTAT AATCCAAGAATACGTGGACGAAATGGAAAAAT TAGCAAGCAGACTTCTGT 1020 

Gl IQEYVDEMEKLASRLLY 176 
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ATTGCACATTAGGCTCACTTGGTGTCACCGTGGAAGACATTGAATGGGCTCACAAGCTAG 1080 

C T L G S L G V T V E 0 I E W A H K L E 196 

AGAAATCTGGATCAAAAGTGGGCAGAGGCG CCATACGACTAAACCACTAC CCGGTTTGTC 1 1 40 

K S G S K V G R G A I R L N H Y P V C P 216 

GA-P17 

CTGAACCAGAACGAGCCATGGGTCTAGCCGCTCATACAGACTCCACTATCCTAACCATTC 1200 

EPERAMGLAAHTDSTILTIL 236 

TGCACCAGAGCAACACGGGAGGGCTACAAGTGTTCAGGGAAGAGTCCGGTTGGGTCACGG 1260 

HQSNTGGLQVFREESGWVTV 256 

TTGAGCCGGCTCCTGGTGTCCTCGTGGTCAACATGGGTGATCTCTTTCACATCTTATCGA 1320 

EPAPGVLVVNMGOLFH I LSN 276 

ACGGG AAAATCCCAAGCG TGGTTCATCGAGCCAAAG T T AACCAT AC TCGG TCAAGAAT T T 1380 

G K 1 P S V V H R A K V N H T R S R I S 296 

CGATTGCGTACTTATGGGGTGGTCCAGCTGGTGATGTGCAAATCGCACCTATCTCTAAGT 1440 

I A Y L W G G P A G D V Q I A P I S K L 316 

TAACCGGTCCGGCTGAACCGTCTCTTTACCGGTCAATTACATGGAAAGAGTATCTCCAAA 1500 

T G P A E P S L Y R S I T W K E Y L Q I 336 

TAAAG T ATGAGG TT TTCGACAAGGCCATGGACGCAAT TAGGGTCG T T AATCCCACCAAT t 1560 

KYEVFDKAMOA I RVVNPTN 355 

oootctccttctcogotoclctcttooigooooocctooottoootqcqo 1610 
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1 tcoloccooo oocataoogc coooolaloo acocotoagc cttllogcAT 

51 GAGTTCAACG TTGAGCGATG TGTTTAGATC GCATCCCATT CACATCCCAC 

101 TCTCAAACCC ACCTGACTTC AAATCTCTCC CGGATTCTTA CACGTGGACT 

151 CCTAAAGATG ATCTCCTCTT CTCCGCCTCC GCCTCCGACG AAACCCTGCC 

201 GCTCATCGAC CTCTCCGATA TCCACGTGGC CACTCTTGTG GGCCATGCTT 

251 GTACCACGTG GGGAGCGTTC CAGATCACCA ACCACGGCGT CCCCTCGGGA 

301 CTTCTCGACG ACATTGAGTT CCTCACCGGA AGTCTTTTCC GGCTTCCCGT 

351 ACAGCGGAAG CTCAAGGCGG CTCGGTCAGA GAATGGCGTG TCCGGCTACG 

401 GCGTAGCTCG TATTGCTTCG TTCTTTAATA AGAAGATGTG GTCCGAAGGT 

451 TTCACCGTTA TTGGCTCTCC CCTCCACGAT TTCCGTAAAC TCTGGCCCAG 

501 CCACCACCTC AAATACTGTG AAATTATTGA AGAGTATGAA GAACATATGC 

551 AAAAGTTGGC AGCCAAGTTG ATGTGGTTCG CATTAGGTTC ACTGGGAGTT 

601 GAAGAAAAGG ACATACAATG GGCCGGGCCT AATTCAGACT TTCAAGGAAC 

651 CCAAGCAGCT ATCCAACTAA ACCATTATCC AAAATGTCCA GAACCAGACA 

701 GAGCCATGGG CCTCGCAGCC CATACAGACT CGACCCTCAT GACCATTCTG 

751 TACCAGAACA ACACCGCCGG TCTCCAAGTT TTCCGGGATG ACGTGGGCTG 

801 GGTTACCGCG CCACCTGTCC CTGGCTCGCT GGTGGTCAAC GTCGGTGACT 

851 TGCTCCACAT TTTAACCAAC GGAATCTTCC CGAGCGTGCT TCACCGAGCC 

901 AGGGTTAACC ACGTCCGATC TCGGTTCTCA ATGGCTTACC TGTGGGGTCC 

951 ACCATCCGAT GTAATGATCT CTCCACTTCC CAAACTGGTT GATCCTCTCC 

1001 AATCTCCTCT CTACCCATCT CTCACTTGGA AACAATACCT TGCTACCAAA 

1051 GCTACTCATT TTAATCAATC TCTTTCCATT ATTAGAAATl ggclgtcUc 

1101 cgoct 

FIG.7 
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GA-P27E 

Icoccgolctotoootococtcclcltctccoccooooqlotcototcotoccoooooco 

looagccooootolooacocotoogcc 1 1 1 1 agcATGAGTTCAACGT TGAGCGATGTGT T 

MSSTLSOVF 

TAGATCGCATCCCATTCACATCCCACTCTCAAACCCACCTGACTTCAAATCTCTCCCGGA 
R S H P I H I P L S N P P 0 F K S L P 0 

T TCT TACACGTGGACTCCTAAAGATGATGTCCTCTTC TCCGCCTCCGCCTCCGACGAMC 
SYTWTPKDDLLFSASASDET 

CC TGCCGCTCATCGACCTCTCCGATATCGACG TGGCCACTCT TG TGGGCCATGCT TGTAC 
LPL IOLSDIHVATLVGHACT 

CACG TGGGGAGCGT TCCAGATCACCAACCACGGCGTCCCCTCGCGACTTCTCGACGACAT 
TWGAFQI TNHGVPSRLLOOI 

GA-P20 



TG AG TTCCTCACCGGAAGTC TT T TCCGGGT TCCCGT ACAGCGGAAGC TCAAGGCGGCTCXJ 
EFITGSLFRLPVQRKLKAAR 

G TCAG AGAATGGCG TCTCCGGC T ACGGCG T AGCTCG TAT TGCTTCG T TCT TT AAT AAGAA 
SENGVSGYGVARIASFFNKK 

GATGTGGTCCGAAGGTTTCACCGTTATTGGCTCTCCCCTCCACGATTTCCGTAAACTCTG 
MWSEGFTVIGSPLHDFRKLW 

GCCCAGCCACCACCTCAAATACTGgtotctUtlcaatggUcaUtlolcoocqUoaq 
PSHHLKYW 

occotottaocgtaocgtaocUolctUglolgaoooooooooooaoooctgtggocgt 
loglocogtlgoctotlcoollgotologoUcgggootootocgoooogggtooogtog 
ooaccolttUlgccolgtcglogllogloaaoogcacoatgooooclcotggocccocc 

ooooogoUocolgotol oototoiolototototttototoootoHototoototoiT 
GA-P19 

tototootoUalglgcoooaoUooolgoooaloooloUotcoggogoalgtgoooto 
cog t o t oogo tltlcclU ggc t ocot gocgot llctotogolll googg Uoaga loc t 

oot Ucotot lotcgot tcoocogTGAAATTATTGAAGAGTATGAAGAACATATGCAAAA 

EIIEEYEEHMQK 

G T TGGCAGCCAAGT TGATGTGG TTCGCAT TAGGTTCaCTGGGAG TTGAAGAAAAGGACAT 
LAAKLMWFALGSLGVEEKD I 

FIG.8A 
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acmtgo;ccgggcctmttcagactttcaaggmcc(m;cagctatccaactaaacca 1 140 

QWAGPNSOFOGTQAAIQLNH 209 

TTATCCAAAATGTCCAGAACCAGACAGAGCCATGGGCCTCGCAGCCCATACAGACTCGAC 1200 

YPKCPEPORAMGLAAHTDST 229 

CCTCATGACCATTCTGTACCAGAACAACACCGCCGGTCTCCAAGTTTTCCGGGATGAGGT 1260 

LMT I LYONNTAGLOVFROOV 249 

GGGCTGGGTTACCGCGCCACCTGTCCCTGGCTCGCTGGTGGTCAACGTCGGTGACTTGCT 1320 

GWVTAPPVPGSLVVNVGDLL 269 

CCACAT T T T AACCAACXXJAATCT TCCCGAGCG TGCT TCACCG AGCCAGGGT TAACCACG T 1380 

Hli-TNGIFPSVLHRARVNHV 289 

CCG ATCTCGG T TCTCAATGGC T T ACCTG TGGGG TCCACCATCCGATG T AATG ATC TCTCC 1440 

RSRFSMAYLWGPPSDVMI SP 309 

ACTTCCCAAACTGGTTGATCCTCTCCAATCTCCTCTCTACCCATCTCTCACTTGGAAACA 1500 

L P K L V D P L Q S P L Y P S L T W K Q 329 

ATACCTTGCTACCAAAGCTACTCATTTTAATCAATCTCTTTCCATTATTAGAAAT tooc I 1560 

YLATKATHFNQSLSI I R N 347 

gtctt ccgoctgaoUtcUgotU tcogoUUoctotUotlUcUogloolotqol 1620 
GA-P21 

gotolctaUoctgUtcgollttogotgogtggttcUcaootlcocoatlogtogctt 1680 

OOlottgotl * " icon 
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